×

Multiscale geometric feature extraction for high-dimensional and non-Euclidean data with applications. (English) Zbl 1468.62303

Summary: A method for extracting multiscale geometric features from a data cloud is proposed and analyzed. Based on geometric considerations, we map each pair of data points into a real-valued feature function defined on the unit interval. Further statistical analysis is then based on the collection of feature functions. The potential of the method is illustrated by different applications, including classification and anomaly detection. Connections to other concepts, such as random set theory, localized depth measures and nonlinear dimension reduction, are also explored.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G07 Density estimation
62R20 Statistics on metric spaces

Software:

SiZer
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aeberhard, S., Coomans, D. and de Vel, O. (1994). CComparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognition 27 1065-1077.
[2] Agostinelli, C. (2018). Local half-region depth for functional data. J. Multivariate Anal. 163 67-79. · Zbl 1490.62512 · doi:10.1016/j.jmva.2017.10.004
[3] Agostinelli, C. and Romanazzi, M. (2011). Local depth. J. Statist. Plann. Inference 141 817-830. · Zbl 1353.62019 · doi:10.1016/j.jspi.2010.08.001
[4] Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96 6745-6750.
[5] Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16 77-102. · Zbl 1337.68221
[6] Bubenik, P. and Kim, P. T. (2007). A statistical approach to persistent homology. Homology, Homotopy Appl. 9 337-362. · Zbl 1136.55004
[7] Campos, G. O., Zimek, A., Sander, J., Campello, R. J. G. B., Micenková, B., Schubert, E., Assent, I. and Houle, M. E. (2016). On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30 891-927. · doi:10.1007/s10618-015-0444-8
[8] Chandler, G. and Polonik, W. (2021). Supplement to “Multiscale geometric feature extraction for high-dimensional and non-Euclidean data with applications.” https://doi.org/10.1214/20-AOS1988SUPP
[9] Chaudhuri, P. and Marron, J. S. (1999). SiZer for exploration of structures in curves. J. Amer. Statist. Assoc. 94 807-823. · Zbl 1072.62556 · doi:10.2307/2669996
[10] Chen, Y., Dang, X., Peng, H. and Bart, H. L. (2008). Outlier detection with the kernelized spatial depth function. IEEE Trans. Pattern Anal. Mach. Intell. 31 288-305.
[11] Cuturi, M. (2010). Positive definite kernels in machine learning Technical report.
[12] Dutta, S., Sarkar, S. and Ghosh, A. K. (2016). Multi-scale classification using localized spatial depth. J. Mach. Learn. Res. 17 Art. ID 218. · Zbl 1434.62119
[13] Einmahl, J. H. J. and Mason, D. M. (1992). Generalized quantile processes. Ann. Statist. 20 1062-1078. · Zbl 0757.60012 · doi:10.1214/aos/1176348670
[14] Elmore, R. T., Hettmansperger, T. P. and Xuan, F. (2006). Spherical data depth and a multivariate median. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 72 87-101. Amer. Math. Soc., Providence, RI.
[15] Fraiman, R., Gamboa, F. and Moreno, L. (2019). Connecting pairwise geodesic spheres by depth: DCOPS. J. Multivariate Anal. 169 81-94. · Zbl 1404.60070 · doi:10.1016/j.jmva.2018.08.008
[16] Genton, M. G. (2002). Classes of kernels for machine learning: A statistics perspective. J. Mach. Learn. Res. 2 293-312. · Zbl 1037.68111 · doi:10.1162/15324430260185637
[17] Kotík, L. and Hlubinka, D. (2017). A weighted localization of halfspace depth and its properties. J. Multivariate Anal. 157 53-69. · Zbl 1362.62029 · doi:10.1016/j.jmva.2017.02.008
[18] Leng, X. and Müller, H.-G. (2006). Classification using functional data analysis for temporal gene expression data. Bioinformatics 22 68-76. · doi:10.1093/bioinformatics/bti742
[19] Liu, R. Y. (1988). On a notion of simplicial depth. Proc. Natl. Acad. Sci. USA 85 1732-1734. · Zbl 0635.62039 · doi:10.1073/pnas.85.6.1732
[20] Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405-414. · Zbl 0701.62063 · doi:10.1214/aos/1176347507
[21] Massé, J.-C. (2004). Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean. Bernoulli 10 397-419. · Zbl 1053.62021 · doi:10.3150/bj/1089206404
[22] Minotte, M. and Scott, D. (1993). The mode tree: A tool for visualization of nonparametric density features. J. Comput. Graph. Statist. 2 51-68.
[23] Paindaveine, D. and Van Bever, G. (2013). From depth to local depth: A focus on centrality. J. Amer. Statist. Assoc. 108 1105-1119. · Zbl 06224990 · doi:10.1080/01621459.2013.813390
[24] Pham, N. (2018). L1-depth revisited: A robust angle-based outlier factor in high-dimensional space. In Machine Learning and Knowledge Discovery in Databases 105-121. Springer, Berlin.
[25] Rényi, A. and Sulanke, R. (1963). Über die konvexe Hülle von \(n\) zufällig gewählten Punkten. Z. Wahrsch. Verw. Gebiete 2 75-84. · Zbl 0118.13701 · doi:10.1007/BF00535300
[26] Serfling, R. (2019). Depth functions on general data spaces, I. Perspectives, with consideration of “density” and “local” depths. Available at https://personal.utdallas.edu/ serfling/papers/I.Perspectives.pdf.
[27] Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge Univ. Press, Cambridge.
[28] Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9 3273-3297.
[29] Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 2319-2323.
[30] Ting, K. M., Zhou, G.-T., Liu, F. T. and Tan, S. C. (2013). Mass estimation. Mach. Learn. 90 127-160. · Zbl 1260.68349 · doi:10.1007/s10994-012-5303-x
[31] Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Ann. Statist. 28 461-482 · Zbl 1106.62334 · doi:10.1214/aos/1016218226
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.