×

A simulation study to compare robust clustering methods based on mixtures. (English) Zbl 1284.62366

Summary: The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform “noise”: an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as “noise component” to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed, a pseudo-MLE for a Gaussian mixture with improper fixed constant over the real line to catch “noise”, and MLEs for mixtures of \(t\)-distributions with and without estimation of the degrees of freedom. The RIMLE is the best method in some, and acceptable in all, simulation setups, and can therefore be recommended.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F35 Robustness and adaptive procedures (parametric inference)
62F10 Point estimation
62F12 Asymptotic properties of parametric estimators

Software:

mclust
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Banfield J, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49: 803–821 · Zbl 0794.62034
[2] Coretto P (2008) The noise component in model-based clustering. PhD thesis, Department of Statistical Science, University College London. http://www.ontherubicon.com/pietro/docs/phdthesis.pdf
[3] Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25: 553–576 · Zbl 0878.62045
[4] Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41: 578–588 · Zbl 0920.68038
[5] Fraley C, Raftery AE (2006) Mclust version 3 for r: normal mixture modeling and model-based clustering. Technical report 504, Department of Statistics, University of Washington
[6] Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(5): 347–380 · Zbl 1064.62074
[7] García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 38(3): 1324–1345 · Zbl 1360.62328
[8] Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13: 795–800 · Zbl 0576.62039
[9] Hennig C (2004) Breakdown points for maximum likelihood estimators of location-scale mixtures. Ann Stat 32(4): 1313–1340 · Zbl 1047.62063
[10] Hennig C (2005) Robustness of ML estimators of location-scale mixtures. In: Baier D, Wernecke KD (eds) Innovations in classification. Data science, and information systems. Springer, Heidelberg, pp 128–137 · Zbl 05243397
[11] Hennig C, Coretto P (2008) The noise component in model-based cluster analysis. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Springer, Berlin, , pp 127–138
[12] Hosmer DW (1978) Comment on ”Estimating mixtures of normal distributions and switching regressions” by R. Quandt and J.B. Ramsey. J Am Stat Assoc 73(364): 730–752 · Zbl 0401.62024
[13] Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4): 577–590 · Zbl 1429.62082
[14] Liu C (1997) ML estimation of the multivariate t distribution and the EM algorithms. J Multivar Anal 63: 296–312 · Zbl 0884.62059
[15] McLachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York · Zbl 0882.62012
[16] McLachlan G, Peel D (2000) Robust mixture modelling using the t-distribution. Stat Comput 10(4): 339–348
[17] Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 17(3): 299–308 · Zbl 1328.62033
[18] Redner R, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239 · Zbl 0536.62021
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.