×

Kernel density estimation via diffusion. (English) Zbl 1200.62029

Summary: We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability.

MSC:

62G07 Density estimation
62G20 Asymptotic properties of nonparametric inference
60J60 Diffusion processes
65C60 Computational problems in statistics (MSC2010)
35K05 Heat equation
35K15 Initial value problems for second-order parabolic equations
60J70 Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.)

Software:

KernSmooth
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abramson, I. S. (1982). On bandwidth variation in kernel estimates-a square root law. Ann. Statist. 10 1217-1223. · Zbl 0507.62040 · doi:10.1214/aos/1176345986
[2] Azencott, R. (1984). Density of diffusions in small time: Asymptotic expansions. In Seminar on Probability, XVIII. Lecture Notes in Math. 1059 402-498. Springer, Berlin.
[3] Bellman, R. (1961). A Brief Introduction to Theta Functions . Holt, Rinehart and Winston, New York. · Zbl 0098.28301
[4] Botev, Z. I. (2007). Kernel density estimation using Matlab. Available at .
[5] Botev, Z. I. (2007). Nonparametric density estimation via diffusion mixing. Technical report, Dept. Mathematics, Univ. Queensland. Available at .
[6] Chaudhuri, P. and Marron, J. S. (2000). Scale space view of of curve estimation. Ann. Statist. 28 408-428. · Zbl 1106.62318 · doi:10.1214/aos/1016218224
[7] Choi, E. and Hall, P. (1999). Data sharpening as a prelude to density estimation. Biometrika 86 941-947. JSTOR: · Zbl 0942.62038 · doi:10.1093/biomet/86.4.941
[8] Cohen, J. K., Hagin, F. G. and Keller, J. B. (1972). Short time asymptotic expansions of solutions of parabolic equations. J. Math. Anal. Appl. 38 82-91. · Zbl 0226.35039 · doi:10.1016/0022-247X(72)90119-9
[9] Csiszár, I. (1972). A class of measures of informativity of observation channels. Period. Math. Hungar. 2 191-213. · Zbl 0247.94018 · doi:10.1007/BF02018661
[10] Devrôye, L. (1997). Universal smoothing factor selection in density estimation: Theory and practice. Test 6 223-320. · Zbl 0949.62026 · doi:10.1007/BF02564701
[11] Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice . Springer, New York. · Zbl 0967.00022
[12] Ethier, S. N. and Kurtz, T. G. (2009). Markov Processes. Characterization and Convergence . Wiley, New York. · Zbl 1089.60005
[13] Feller, W. (1952). The parabolic differential equations and the associated semi-groups of transformations. Ann. of Math. (2) 55 468-519. JSTOR: · Zbl 0047.09303 · doi:10.2307/1969644
[14] Friedman, A. (1964). Partial Differential Equations of Parabolic Type . Prentice Hall, Englewood Cliffs, NJ. · Zbl 0144.34903
[15] Hall, P. (1990). On the bias of variable bandwidth curve estimators. Biometrika 77 523-535. JSTOR: · Zbl 0733.62046 · doi:10.1093/biomet/77.3.529
[16] Hall, P., Hu, T. C. and Marron, J. S. (1995). Improved variable window kernel estimates of probability densities. Ann. Ststist. 23 1-10. · Zbl 0822.62026 · doi:10.1214/aos/1176324451
[17] Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6 109-115. · Zbl 0628.62029 · doi:10.1016/0167-7152(87)90083-6
[18] Hall, P. and Minnotte, M. C. (2002). High order data sharpening for density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 141-157. JSTOR: · Zbl 1015.62031 · doi:10.1111/1467-9868.00329
[19] Hall, P. and Park, B. U. (2002). New methods for bias correction at endpoints and boundaries. Ann. Statist. 30 1460-1479. · Zbl 1014.62041 · doi:10.1214/aos/1035844983
[20] Hall, P. and Park, B. U. (2002). New methods for bias correction at endpoints and boundaries. Ann. Statist. 30 1460-1479. · Zbl 1014.62041 · doi:10.1214/aos/1035844983
[21] Havrda, J. H. and Charvat, F. (1967). Quantification methods of classification processes: Concepts of structural \alpha entropy. Kybernetika (Prague) 3 30-35. · Zbl 0153.48403
[22] Jones, M. C. and Foster, P. J. (1996). A simple nonnegative boundary correction method for kernel density estimation. Statist. Sinica 6 1005-1013. · Zbl 0859.62037
[23] Jones, M. C., Marron, J. S. and Park, B. U. (1991). A simple root n bandwidth selector. Ann. Statist. 19 1919-1932. · Zbl 0745.62033 · doi:10.1214/aos/1176348378
[24] Jones, M. C., Marron, J. S. and Sheather, S. J. (1993). Simple boundary correction for kernel density estimation. Statist. Comput. 3 135-146.
[25] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401-407. JSTOR: · Zbl 0873.62040 · doi:10.2307/2291420
[26] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). Progress in data-based bandwidth selection for kernel density estimation. Comput. Statist. 11 337-381. · Zbl 0897.62037
[27] Jones, M. C., McKay, I. J. and Hu, T. C. (1994). Variable location and scale kernel density estimation. Ann. Inst. Statist. Math. 46 521-535. · Zbl 0818.62039
[28] Jones, M. C. and Signorini, D. F. (1997). A comparison of higher-order bias kernel density estimators. J. Amer. Statist. Assoc. 92 1063-1073. JSTOR: · Zbl 0888.62035 · doi:10.2307/2965571
[29] Kannai, Y. (1977). Off diagonal short time asymptotics for fundamental solutions of diffusion equations. Comm. Partial Differential Equations 2 781-830. · Zbl 0381.35039 · doi:10.1080/03605307708820048
[30] Kapur, J. N. and Kesavan, H. K. (1987). Generalized Maximum Entropy Principle (With Applications) . Standford Educational Press, Waterloo, ON. · Zbl 0718.62007
[31] Karunamuni, R. J. and Alberts, T. (2005). A generalized reflection method of boundary correction in kernel density estimation. Canad. J. Statist. 33 497-509. · Zbl 1097.62022 · doi:10.1002/cjs.5550330403
[32] Karunamuni, R. J. and Zhang, S. (2008). Some improvements on a boundary corrected kernel density estimator. Statist. Probab. Lett. 78 499-507. · Zbl 1133.62322 · doi:10.1016/j.spl.2007.09.002
[33] Kerm, P. V. (2003). Adaptive kernel density estimation. Statist. J. 3 148-156.
[34] Kloeden, P. E. and Platen, E. (1999). Numerical Solution of Stochastic Differential Equations . Springer, Berlin. · Zbl 0752.60043
[35] Ladyženskaja, O. A., Solonnikov, V. A. and Ural’ceva, N. N. (1967). Linear and Quasilinear Equations of Parabolic Type. Translations of Mathematical Monographs 23 xi+648. Amer. Math. Soc., Providence, RI. · Zbl 0174.15403
[36] Larsson, S. and Thomee, V. (2003). Partial Differential Equations with Numerical Methods . Springer, Berlin.
[37] Lehmann, E. L. (1990). Model specification: The views of fisher and neyman, and later developments. Statist. Sci. 5 160-168. · Zbl 0955.62516 · doi:10.1214/ss/1177012164
[38] Loader, C. R. (1999). Bandwidth selection: Classical or plug-in. Ann. Statist. 27 415-438. · Zbl 0938.62035 · doi:10.1214/aos/1018031201
[39] Loftsgaarden, D. O. and Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 1049-1051. · Zbl 0132.38905 · doi:10.1214/aoms/1177700079
[40] Marron, J. S. (1985). An asymptotically efficient solution to the bandwidth problem of kernel density estimation. Ann. Statist. 13 1011-1023. · Zbl 0585.62073 · doi:10.1214/aos/1176349653
[41] Marron, J. S. and Ruppert, D. (1996). Transformations to reduce boundary bias in kernel density-estimation. J. Roy. Statist. Soc. Ser. B 56 653-671. JSTOR: · Zbl 0805.62046
[42] Marron, J. S. and Wand, M. P. (1992). Exact mean integrated error. Ann. Statist. 20 712-736. · Zbl 0746.62040 · doi:10.1214/aos/1176348653
[43] Molchanov, S. A. (1975). Diffusion process and Riemannian geometry. Russian Math. Surveys 30 1-63. · Zbl 0315.53026 · doi:10.1070/RM1975v030n01ABEH001400
[44] Park, B. U., Jeong, S. O. and Jones, M. C. (2003). Adaptive variable location kernel density estimators with good performance at boundaries. J. Nonparametr. Stat. 15 61-75. · Zbl 1019.62031 · doi:10.1080/10485250306041
[45] Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidith selectors. J. Amer. Statist. Assoc. 85 66-72.
[46] Samiuddin, M. and El-Sayyad, G. M. (1990). On nonparametric kernel density estimates. Biometrika 77 865. JSTOR: · Zbl 0712.62033 · doi:10.1093/biomet/77.4.865
[47] Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization . Wiley, New York. · Zbl 0850.62006
[48] Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683-690. JSTOR: · Zbl 0800.62219
[49] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis . Chapman and Hall, London. · Zbl 0617.62042
[50] Simonoff, J. S. (1996). Smoothing Methods in Statistics . Springer, New York. · Zbl 0859.62035
[51] Terrell, G. R. and Scott, D. W. (1992). Variable kernel density estimation. Ann. Statist. 20 1236-1265. · Zbl 0763.62024 · doi:10.1214/aos/1176348768
[52] Wand, M. P. and Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9 97-117. · Zbl 0937.62055
[53] Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing . Chapman and Hall, London. · Zbl 0854.62043
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.