zbMATH — the first resource for mathematics

Progress in data-based bandwidth selection for kernel density estimation. (English) Zbl 0897.62037
Summary: We review the extensive recent literature on automatic, data-based selection of a global smoothing parameter in univariate kernel density estimation. Proposals are presented in a unified framework, making considerable reference to their theoretical properties as we go. The results of a major simulation study of the practical performance of many of these methods are summarized. Also, our remarks are further consolidated by describing a small portion of our practical experience on real datasets. Our comparison of methods’ practical performance demonstrates that improvements to be gained by using the better methods can be, and often are, considerable. It will be seen that achieving optimal theoretical performance [up to bounds derived by P. Hall and J. S. Marron, Probab. Theory Relat. Fields 90, No. 2, 149-173 (1991; Zbl 0742.62041)] and acceptable practical performance is not accomplished by the same techniques. We put much effort into making good practical choices whenever options arise. We emphasize that arguably the two best known bandwidth selection methods cannot be advocated for general practical use; these are “least squares cross-validation” (which suffers from too much variability) and normal-based “rules-of-thumb” (which are too biased towards oversmoothing). A number of methods that do seem to be worthy of further consideration are listed. We show why our overall current preference is for the method of S. J. Sheather and M. C. Jones [J. R. Stat. Soc., Ser. B 58, No. 3, 683-690 (1991)]. It is hoped that the lessons learned in this comparatively simple setting will also prove useful in many other smoothing situations.

62G07 Density estimation
65C99 Probabilistic methods, stochastic differential equations