×

On the breakdown behavior of the TCLUST clustering procedure. (English) Zbl 1273.62146

Summary: Clustering procedures allowing for general covariance structures of the obtained clusters need some constraints on the solutions. With this in mind, several proposals have been introduced in the literature. The TCLUST procedure works with a restriction on the “eigenvalues-ratio” of the clusters scatter matrices. In order to try to achieve robustness with respect to outliers, the procedure allows to trim off a proportion \(\alpha\) of the most outlying observations. The resistance to infinitesimal contamination of the TCLUST has already been studied. This paper aims to look at its resistance to a higher amount of contamination by means of the study of its breakdown behavior. The rather new concept of restricted breakdown points will demonstrate that the TCLUST procedure resists to a proportion \(\alpha\) of contamination as soon as the data set is sufficiently “well clustered”.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F35 Robustness and adaptive procedures (parametric inference)
62G35 Nonparametric robustness

Software:

clusfind
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25:553–576 · Zbl 0878.62045
[2] Dennis JE Jr. (1982) Algorithms for nonlinear fitting. In: Nonlinear optimization, Cambridge, 1981. Academic Press, London, pp 67–78
[3] Donoho D, Huber PJ (1983) The notion of breakdown point. In: A festschrift for Erich L. Lehmann. Wadsworth, Belmont, pp 157–184 · Zbl 0523.62032
[4] Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631 · Zbl 1073.62545
[5] Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33:347–380 · Zbl 1064.62074
[6] Gallegos MT, Ritter G (2009a) Trimmed ML estimation of contaminated mixtures. Sankhyā 71:164–220 · Zbl 1193.62021
[7] Gallegos MT, Ritter G (2009b) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3:135–167 · Zbl 1284.62372
[8] García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94:956–969 · Zbl 1072.62547
[9] García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36:1324–1345 · Zbl 1360.62328
[10] García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4:89–109 · Zbl 1284.62375
[11] García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21:585–599 · Zbl 1221.62093
[12] Genton MG, Lucas A (2003) Comprehensive definitions of breakdown points for independent and dependent observations. J R Stat Soc, Ser B, Stat Methodol 65:81–94 · Zbl 1063.62038
[13] Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800 · Zbl 0576.62039
[14] Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99:1154–1176 · Zbl 1141.62052
[15] Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley–Interscience, New York · Zbl 1345.62009
[16] McLachlan G, Peel D (2000) Finite mixture models. Wiley–Interscience, New York · Zbl 0963.62061
[17] Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52:299–308 · Zbl 1328.62033
[18] Ruwet C, García-Escudero LA, Gordaliza A, Mayo-Iscar A (2012) The influence function of the TCLUST robust clustering procedure. Adv Data Anal Classif 6:107–130 · Zbl 1255.62182
[19] Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037 · Zbl 1094.68088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.