×

A general trimming approach to robust cluster analysis. (English) Zbl 1360.62328

Summary: We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion \(\alpha\) of contaminating data to guarantee the robustness of the method. As a characteristic feature, restrictions on the ratio between the maximum and the minimum eigenvalues of the groups scatter matrices are introduced. This makes the problem to be well defined and guarantees the consistency of the sample solutions to the population ones.
The method covers a wide range of clustering approaches depending on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F35 Robustness and adaptive procedures (parametric inference)

Software:

Flury
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49 803-821. JSTOR: · Zbl 0794.62034
[2] Bock, H.-H. (2002). Clustering methods: From classical models to new approaches. Statistics in Transition 5 725-758.
[3] Celeux, G. and Govaert, A. (1992). A classification EM algorithm for clustering and two stochastic versions. Comput. Statist. Data Anal. 14 315-332. · Zbl 0937.62605
[4] Cuesta-Albertos, J. A., Gordaliza, A. and Matrán, C. (1997). Trimmed k -means: An attempt to robustify quantizers. Ann. Statist. 25 553-576. · Zbl 0878.62045
[5] Dempster, A., Laird, N. and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. JSTOR: · Zbl 0364.62022
[6] Dykstra, R. L. (1983). An algorithm for restricted least squares regression. J. Amer. Statist. Assoc. 78 837-842. JSTOR: · Zbl 0535.62063
[7] Flury, B. (1997). A First Course in Multivariate Statistics . Springer, New York. · Zbl 0879.62052
[8] Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Computer J. 41 578-588. · Zbl 0920.68038
[9] Gallegos, M. T. (2001). Robust clustering under general normal assumptions. Preprint. Available at http://www.fmi.uni-passau.de/forschung/mip-berichte/MIP-0103.html.
[10] Gallegos, M. T. (2002). Maximum likelihood clustering with outliers. In Classification , Clustering and Data Analysis : Recent Advances and Applications (K. Jajuga, A. Sokolowski and H.-H. Bock, eds.) 247-255. Springer, New York. · Zbl 1032.62059
[11] Gallegos, M. T. and Ritter, G. (2005). A robust method for cluster analysis. Ann. Statist. 33 347-380. · Zbl 1064.62074
[12] García-Escudero, L. A. and Gordaliza, A. (1999). Robustness properties of k -means and trimmed k -means. J. Amer. Statist. Assoc. 94 956-969. · Zbl 1072.62547
[13] García-Escudero, L. A. and Gordaliza, A. (2007). The importance of the scales in heterogeneous robust clustering. Comput. Statist. Data Anal. 51 4403-4412. · Zbl 1162.62379
[14] García-Escudero, L. A., Gordaliza, A. and Matrán, C. (1999). A central limit theorem for multivariate generalized trimmed k -means. Ann. Statist. 27 1061-1079. · Zbl 0984.62042
[15] García-Escudero, L. A., Gordaliza, A. and Matrán, C. (2003). Trimming tools in exploratory data analysis. J. Comput. Graph. Statist. 12 434-449.
[16] García-Escudero, L. A., Gordaliza, A., Matrán, C. and Mayo-Iscar, A. (2006). The TCLUST approach to robust cluster analysis. Technical report. Available at http://www.eio.uva.es/inves/grupos/representaciones/trTCLUST.pdf. · Zbl 1360.62328
[17] Goldfarb, D. and Idnani, A. (1983). A numerically stable dual method for solving strictly convex quadratic programs. Math. Program. 27 1-33. · Zbl 0537.90081
[18] Hathaway, R. J. (1985). A constrained formulation of maximum likelihood estimation for normal mixture distributions. Ann. Statist. 13 795-800. · Zbl 0576.62039
[19] Hennig, C. (2004). Breakdown points for ML estimators of location-scale mixtures. Ann. Statist. 32 1313-1340. · Zbl 1047.62063
[20] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis . Academic Press, London. · Zbl 0432.62029
[21] Maronna, R. (2005). Principal components and orthogonal regression based on robust scales. Technometrics 47 264-273.
[22] Maronna, R. and Jacovkis, P. M. (1974). Multivariate clustering procedures with variable metrics. Biometrics 30 499-505. · Zbl 0285.62036
[23] McLachlan, G. and Peel, D. (2000). Finite Mixture Models . Wiley, New York. · Zbl 0963.62061
[24] Papadimitriou, C. H. and Steiglitz, K. (1982). Combinatorial Optimization : Algorithms and Complexity . Prentice-Hall, Englewood Cliffs, NJ. · Zbl 0503.90060
[25] Rousseeuw, P. J. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 212-223.
[26] Scott, A. J. and Symons, M. J. (1971). Clustering methods based on likelihood ratio criteria. Biometrics 27 387-397.
[27] Van Aelst, S., Wang, X., Zamar, R. H. and Zhu, R. (2006). Linear grouping using orthogonal regression. Comput. Statist. Data Anal. 50 1287-1312. · Zbl 1431.62273
[28] Van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes . Wiley, New York. · Zbl 0862.60002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.