×

zbMATH — the first resource for mathematics

Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. (English) Zbl 1337.62067
Summary: Important information concerning a multivariate data set, such as clusters and modal regions, is contained in the derivatives of the probability density function. Despite this importance, nonparametric estimation of higher order derivatives of the density functions have received only relatively scant attention. Kernel estimators of density functions are widely used as they exhibit excellent theoretical and practical properties, though their generalization to density derivatives has progressed more slowly due to the mathematical intractabilities encountered in the crucial problem of bandwidth (or smoothing parameter) selection. This paper presents the first fully automatic, data-based bandwidth selectors for multivariate kernel density derivative estimators. This is achieved by synthesizing recent advances in matrix analytic theory which allow mathematically and computationally tractable representations of higher order derivatives of multivariate vector valued functions. The theoretical asymptotic properties as well as the finite sample behaviour of the proposed selectors are studied. In addition, we explore in detail the applications of the new data-driven methods for two other statistical problems: clustering and bump hunting. The introduced techniques are combined with the mean shift algorithm to develop novel automatic, nonparametric clustering procedures which are shown to outperform mixture-model cluster analysis and other recent nonparametric approaches in practice. Furthermore, the advantage of the use of smoothing parameters designed for density derivative estimation for feature significance analysis for bump hunting is illustrated with a real data example.

MSC:
62G07 Density estimation
62G05 Nonparametric estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI Euclid arXiv
References:
[1] Azzalini, A. and Torelli, N. (2007) Clustering via nonparametric density estimation. Stat. Comput. , 17 , 71-80.
[2] Bowman, A.W. (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika , 71 , 353-360.
[3] Cao, R., Cuevas, A. and González-Manteiga, W. (1994) A comparative study of several smoothing methods in density estimation. Comput. Statist. Data Anal. , 17 , 153-176. · Zbl 0937.62518
[4] Chacón, J.E. (2009). Data-driven choice of the smoothing parametrization for kernel density estimators. Canad. J. Statist. 37 , 249-265. · Zbl 1176.62028
[5] Chacón, J.E. and Duong, T. (2010) Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test , 19 , 375-398. · Zbl 1203.62054
[6] Chacón, J.E. and Duong, T. (2011) Unconstrained pilot selectors for smoothed cross validation. Aust. New Zealand J. Statist. , 53 , 331-351. · Zbl 1334.62049
[7] Chacón, J.E. and Duong, T. (2012) Efficient recursive algorithms for functionals based on higher order derivatives of the multivariate Gaussian density. In preparation. · Zbl 1332.62170
[8] Chacón, J.E., Duong, T. and Wand, M.P. (2011) Asymptotics for general multivariate kernel density derivative estimators. Statistica Sinica , 21 , 807-840. · Zbl 1214.62039
[9] Chaudhuri, P. and Marron, J.S. (1999) SiZer for exploration of structure in curves. J. Amer. Statist. Assoc. , 94 , 807-823. · Zbl 1072.62556
[10] Cheng, Y. (1995) Mean shift, mode seeking, and clustering. IEEE T. Pattern Anal. , 17 , 790-799.
[11] Choi, E. and Hall, P. (1999) Data sharpening as a prelude to density estimation. Biometrika , 86 , 941-947. · Zbl 0942.62038
[12] Comaniciu, D. (2003) An algorithm for data-driven bandwidth selection. IEEE T. Pattern Anal. , 25 , 281-288.
[13] Comaniciu, D. and Meer, P. (2002) Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. , 24 , 603-619.
[14] Comaniciu, D., Ramesh, V. and Meer, P. (2003) Kernel-based object tracking. IEEE Trans. Pattern Anal. , 25 , 564-577.
[15] Cuevas, A., Febrero, M. and Fraiman, R. (2001) Cluster analysis: a further approach based on density estimation. Comput. Statist. Data Anal. , 36 , 441-459. · Zbl 1053.62537
[16] Dobrovidov, A.V. and Rud’ko, I.M. (2010) Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method. Autom. Remote Control , 71 , 209-224. · Zbl 05790879
[17] Duong, T. (2007) ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Statist. Softw. , 21(7) , 1-16.
[18] Duong, T., Cowling, A., Koch, I. and Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Comput. Stat. Data Anal. , 52 , 4225-4242. · Zbl 1452.62265
[19] Duong, T. and Hazelton, M.L. (2003) Plug-in bandwidth matrices for bivariate kernel density estimation. J. Nonparametr. Stat. , 15 , 17-30. · Zbl 1019.62032
[20] Duong, T. and Hazelton, M.L. (2005a) Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. J. Multivariate Anal. , 93 , 417-433. · Zbl 1066.62059
[21] Duong, T. and Hazelton, M.L. (2005b) Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand. J. Statist. , 32 , 485-506. · Zbl 1089.62035
[22] Forina M., Armanino C., Lanteri S. and Tiscornia E. (1983) Classification of olive oils from their fatty acid composition. In: H. Martens and H.J. Russwurm (Eds.), Food Research and Data Analysis , Applied Science Publishers, London, pp. 189-214.
[23] Fraley, C. and Raftery, A.E. (2002) Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. , 97 , 611-631. · Zbl 1073.62545
[24] Frank, A. and Asuncion, A. (2010) UCI Machine Learning Repository [ ]. University of California, Irvine, School of Information and Computer Science.
[25] Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition , 2nd Ed. Academic Press, Boston. · Zbl 0711.62052
[26] Fukunaga, K. and Hostetler, L.D. (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE T. Inform. Theory , 21 , 32-40. · Zbl 0297.62025
[27] Gel’fand, I.M. and Shilov, G.E. (1966) Generalized Functions, Volume 1: Properties and Operations . Academic Press, New York.
[28] Genovese, C.R, Perone-Pacifico, M., Verdinelly, I. and Wasserman, L. (2009) On the path density of a gradient field. Ann. Statist. , 37 , 3236-3271. · Zbl 1191.62062
[29] Godtliebsen, F., Marron, J.S. and Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. J. Comput. Graph. Statist. , 11 , 1-21.
[30] Godtliebsen, F., Marron, J.S. and Chaudhuri, P. (2004) Statistical significance of features in digital images. Image Vision Comput. , 22 , 1093-1104.
[31] Grund, B. and Hall, P. (1995) On the minimisation of the \(L^{p}\) error in mode estimation. Ann Statist. , 23 , 2264-2284. · Zbl 0853.62029
[32] Hall, P. (1983) Large sample optimality of least squares cross-validation in density estimation. Ann. Statist. , 11 , 1156-1174. · Zbl 0599.62051
[33] Hall, P. and Marron, J.S. (1987) Extent to which least-squares cross-validation minimises integrated square error in nonparametric density estimation. Probab. Theory Rel. Fields , 74 , 567-581. · Zbl 0588.62052
[34] Hall, P. and Marron, J.S. (1991) Lower bounds for bandwidth selection in density estimation. Probab. Theory Rel. Fields , 90 , 149-163. · Zbl 0742.62041
[35] Hall, P., Marron, J.S. and Park, B.U. (1992) Smoothed cross validation. Probab. Theory Rel. Fields , 92 , 1-20. · Zbl 0742.62042
[36] Hall, P. and Minotte, M.C. (2002) High order data sharpening for density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. , 64 , 141-157. · Zbl 1015.62031
[37] Härdle, W., Marron, J.S. and Wand, M.P. (1990) Bandwidth choice for density derivatives. J. R. Stat. Soc. Ser. B Stat. Methodol. , 52 , 223-232. · Zbl 0699.62036
[38] Holmquist, B. (1985) The direct product permuting matrices. Linear Multilinear Algebra , 17 , 117-141. · Zbl 0566.15012
[39] Holmquist, B. (1996a) The \(d\)-variate vector Hermite polynomial of order \(k\). Linear Algebra Appl. , 237/238 , 155-190. · Zbl 0848.62027
[40] Holmquist, B. (1996b) Expectations of products of quadratic forms in normal variables. Stochastic Anal. Appl. , 14 , 149-164. · Zbl 0848.60019
[41] Horová, I., Koláček, J. and Vopatová, K. (2013) Full bandwidth matrix selectors for gradient kernel density estimate. Comput. Statist. Data Anal. , 57 , 364-376. · Zbl 1365.62127
[42] Horová, I. and Vopatová, K. (2011) Kernel density gradient estimate. In Recent Advances in Functional Data Analysis and Related Topics (ed F. Ferraty), pp. 177-182, Physica Verlag, Heidelberg.
[43] Horton, P. and Nakai, K. (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. Proceedings of Intelligent Systems in Molecular Biology (ISMB-96) , 109-115.
[44] Hubert, L. and Arabie, P. (1985) Comparing partitions. J. Classification , 2 , 193-218. · Zbl 0587.62128
[45] Jones, M.C. (1991) The roles of ISE and MISE in density estimation. Statist. Probab. Lett. , 12 , 51-56.
[46] Jones, M.C. (1992) Potential for automatic bandwidth choice in variations on kernel density estimation. Statist. Probab. Lett. , 13 , 351-356.
[47] Jones, M.C. (1994) On kernel density derivative estimation. Comm. Statist. Theory Methods , 23 , 2133-2139. · Zbl 0825.62208
[48] Jones, M.C., Marron, J.S. and Park, B.U. (1991) A simple root \(n\) bandwidth selector. Ann. Statist. , 19 , 1919-1932. · Zbl 0745.62033
[49] Jones, M.C., Marron, J.S., and Sheather, S.J. (1996) A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. , 91 , 401-407. · Zbl 0873.62040
[50] Magnus, J.R. and Neudecker, H. (1979) The commutation matrix: some properties and applications. Ann. Statist. , 7 , 381-394. · Zbl 0414.62040
[51] Kollo, T. and von Rosen, D. (2005) Advanced Multivariate Statistics with Matrices . Springer, Dordrecht. · Zbl 1079.62059
[52] Li, J., Ray, S. and Lindsay, B.G. (2007) A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research , 8 , 1687-1723. · Zbl 1222.62076
[53] Magnus, J.R. and Neudecker, H. (1999) Matrix Differential Calculus with Applications in Statistics and Econometrics: Revised Edition . John Wiley & Sons, Chichester. · Zbl 0912.15003
[54] Mathai, A.M. and Provost, S.B. (1992) Quadratic Forms in Random Variables: Theory and Applications . Marcel Dekker, New York. · Zbl 0792.62045
[55] Milligan, G.W. and Cooper, M.C. (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behav. Res. , 21 , 441-458.
[56] Naumann, U. and Wand, M.P. (2009) Automation in high-content flow cytometry screening. Cytometry A , 75A , 789-797.
[57] Park, B.U. and Marron, J.S. (1990) Comparison of data-driven bandwidth selectors. J. Amer. Statist. Assoc. , 85 , 66-72.
[58] Parzen, E. (1962) On estimation of a probability density function and mode. Ann. Math. Statist. , 33 , 1065-1076. · Zbl 0116.11302
[59] Pawlowsky-Glahn, V. and Buccianti, A. (2011) Compositional Data Analysis: Theory and Applications . John Wiley & Sons, Chichester. · Zbl 1103.62111
[60] Pratt, J.P., Zeng, Q.T., Ravnic, D., Huss, H., Rawn, J. and Mentzer, S.J. (2009) Hierarchical clustering of monoclonal antibody reactivity patterns in nonhuman species. Cytometry A , 75A , 734-742.
[61] Rinaldo, A. and Wasserman, L. (2010) Generalized density clustering. Ann. Statist. , 38 , 2678-2722. · Zbl 1200.62066
[62] Rudemo, M. (1982) Empirical choice of histograms and kernel density estimators. Scand. J. Statist. , 9 , 65-78. · Zbl 0501.62028
[63] Schott, J.R. (2003) Kronecker product permutation matrices and their application to moment matrices of the normal distribution. J. Multivariate Anal. , 87 , 177-190. · Zbl 1030.62043
[64] Sheather, S.J. and Jones, M.C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. , 53 , 683-690. · Zbl 0800.62219
[65] Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization . John Wiley & Sons, New York. · Zbl 0850.62006
[66] Simonoff, J.S. (1996) Smoothing Methods in Statistics . Springer-Verlag, Berlin. · Zbl 0859.62035
[67] Stone, C.J. (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist. , 12 , 1285-1297. · Zbl 0599.62052
[68] Stuetzle, W. (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classification , 20 , 25-47. · Zbl 1055.62075
[69] Vieu, P. (1996) A note on density mode estimation. Statist. Probab. Lett. , 26 , 297-307. · Zbl 0847.62024
[70] Wand, M.P. and Jones, M.C. (1993) Comparison of smoothing parameterizations in bivariate kernel density estimation. J. Amer. Statist. Assoc. , 88 , 520-528. · Zbl 0775.62105
[71] Wand, M.P. and Jones, M.C. (1995). Kernel smoothing , Chapman & Hall. · Zbl 0854.62043
[72] Wang, X., Qiu, W. and Zamar, R.H. (2007) CLUES: A non-parametric clustering method based on local shrinking. Comput. Statist. Data Anal. , 52 , 286-298. · Zbl 1452.62474
[73] Wu, T.-J. (1997) Root \(n\) bandwidth selectors for kernel estimation of density derivatives. J. Amer. Statist. Assoc. , 92 , 536-547. · Zbl 1067.62528
[74] Zeng, Q.T., Pratt, J.P., Pak, J., Ravnic, D., Huss, H. and Mentzer, S.J. (2007) Feature-guided clustering of multi-dimensional flow cytometry datasets. Journal of Biomedical Informatics , 40 , 325-331.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.