×

Model-based clustering with determinant-and-shape constraint. (English) Zbl 1452.62443

Summary: Model-based approaches to cluster analysis and mixture modeling often involve maximizing classification and mixture likelihoods. Without appropriate constrains on the scatter matrices of the components, these maximizations result in ill-posed problems. Moreover, without constrains, non-interesting or “spurious” clusters are often detected by the EM and CEM algorithms traditionally used for the maximization of the likelihood criteria. Considering an upper bound on the maximal ratio between the determinants of the scatter matrices seems to be a sensible way to overcome these problems by affine equivariant constraints. Unfortunately, problems still arise without also controlling the elements of the “shape” matrices. A new methodology is proposed that allows both control of the scatter matrices determinants and also the shape matrices elements. Some theoretical justification is given. A fast algorithm is proposed for this doubly constrained maximization. The methodology is also extended to robust model-based clustering problems.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G35 Nonparametric robustness
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Andrews, J.; Wickins, J.; Boers, N.; McNicholas, P., teigen: an R package for model-based clustering and classification via the multivariate \(t\) distribution, J. Stat. Softw., 83, 1-32 (2018)
[2] Bagnato, L.; Punzo, A.; Zoia, MG, The multivariate leptokurtic-normal distribution and its application in model-based clustering, Can. J. Stat., 45, 95-119 (2017) · Zbl 1462.62308
[3] Banfield, JD; Raftery, AE, Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[4] Baudry, JP; Celeux, G., EM for mixtures—initialization requires special care, Stat. Comput., 25, 713-726 (2015) · Zbl 1331.62301
[5] Biernacki, C.; Chretien, S., Degeneracy in the maximum likelihood estimation of univariate, Stat. Probab. Lett., 61, 373-382 (2003) · Zbl 1038.62023
[6] Biernacki, C.; Lourme, A., Stable and visualizable Gaussian parsimonious clustering models, Stat. Comput., 24, 953-969 (2014) · Zbl 1332.62199
[7] Browne, R., Subedi, S., McNicholas, P.: Constrained optimization for a subset of the Gaussian parsimonious clustering models (2013). preprint available at arXiv:1306.5824
[8] Celeux, G.; Govaert, A., A classification EM algorithm for clustering and two stochastic versions, Comput. Stat. Data., 14, 315-332 (1992) · Zbl 0937.62605
[9] Cerioli, A.; García-Escudero, L.; Mayo-Iscar, A.; Riani, M., Finding the number of normal groups in model-based clustering via constrained likelihoods, J. Comput. Graph Stat., 27, 404-416 (2018) · Zbl 07498957
[10] Coretto, P.; Hennig, C., Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering, J. Am. Stat. Assoc., 111, 1648-1659 (2016)
[11] Dang, U.; Browne, R.; McNicholas, PD, Mixtures of multivariate power exponential distributions, Biometrics, 71, 1081-1089 (2015) · Zbl 1419.62330
[12] Day, N., Estimating the components of a mixture of normal distributions, Biometrika, 56, 463-474 (1969) · Zbl 0183.48106
[13] Dotto, F.; Farcomeni, A.; García-Escudero, L.; Mayo-Iscar, A., A reweighting approach to robust clustering, Stat. Comput., 28, 477-493 (2018) · Zbl 1384.62193
[14] Flury, B.; Riedwyl, H., Multivariate Statistics, A Practical Approach (1988), Cambridge: Cambridge University Press, Cambridge
[15] Friedman, H.; Rubin, J., On some invariant criteria for grouping data, J. Am. Stat. Assoc., 63, 1159-1178 (1967)
[16] Fritz, H.; García-Escudero, L.; Mayo-Iscar, A., A fast algorithm for robust constrained clustering, Comput. Stat. Data Anal., 61, 124-136 (2013) · Zbl 1349.62264
[17] Gallegos, M.; Ritter, G., A robust method for cluster analysis, Ann. Stat., 33, 347-380 (2005) · Zbl 1064.62074
[18] Gallegos, M.; Ritter, G., Trimming algorithms for clustering contaminated grouped data and their robustness, Adv. Data Anal. Classif., 10, 135-167 (2009) · Zbl 1284.62372
[19] Gallegos, MT; Jajuga, K.; Sokolowski, A.; Bock, H., Maximum likelihood clustering with outliers, Classification, Clustering and Data Analysis: Recent Advances and Applications, 247-255 (2002), Berlin: Springer, Berlin · Zbl 1032.62059
[20] García-Escudero, L.; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., A general trimming approach to robust cluster analysis, Ann. Stat., 36, 1324-1345 (2008) · Zbl 1360.62328
[21] García-Escudero, L.; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., Exploring the number of groups in robust model-based clustering, Stat. Comput., 21, 585-599 (2011) · Zbl 1221.62093
[22] García-Escudero, L.; Gordaliza, A.; Mayo-Iscar, A., A review of robust clustering methods, Adv. Data Anal. Classif., 8, 27-43 (2014) · Zbl 1459.62110
[23] García-Escudero, L.; Gordaliza, A.; Mayo-Iscar, A., A constrained robust proposal for mixture modeling avoiding spurious solutions, Adv. Data Anal. Classif., 8, 27-43 (2014) · Zbl 1459.62110
[24] García-Escudero, L.; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., Avoiding spurious local maximizers in mixture modeling, Stat. Comput., 25, 619-633 (2015) · Zbl 1331.62100
[25] García-Escudero, L.; Gordaliza, A.; Greselin, F.; Ingrassia, S.; Mayo-Iscar, A., Eigenvalues and constraints in mixture modeling: geometric and computational issues, Adv. Data Anal. Classif., 12, 203-233 (2018) · Zbl 1414.62071
[26] Hathaway, R., A constrained formulation of maximum likelihood estimation for normal mixture distributions, Ann. Stat., 13, 795-800 (1985) · Zbl 0576.62039
[27] Hennig, C.; Liao, TF, How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, J. R. Stat. Soc. Ser. C, 62, 309-369 (2013)
[28] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[29] Ingrassia, S.; Rocci, R., Constrained monotone EM algorithms for finite mixture of multivariate Gaussians, Comput. Stat. Data Anal., 51, 5339-5351 (2007) · Zbl 1445.62116
[30] Kiefer, J.; Wolfowitz, J., Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters, Ann. Math. Stat., 27, 887-906 (1956) · Zbl 0073.14701
[31] Maitra, R.; Melnykov, V., Simulating data to study performance of finite mixture modeling and clustering algorithms, J. Comput. Graph Stat., 19, 354-376 (2010)
[32] Maronna, R.; Jacovkis, P., Multivariate clustering procedures with variable metrics, Biometrics, 30, 499-505 (1974) · Zbl 0285.62036
[33] McLachlan, G.; Peel, D., Finite Mixture Models. Wiley Series in Probability and Statistics (2000), New York: Wiley, New York · Zbl 0963.62061
[34] Neykov, N.; Filzmoser, P.; Dimova, R.; Neytchev, P., Robust fitting of mixtures using the trimmed likelihood estimator, Comput. Stat. Data Anal., 52, 299-308 (2007) · Zbl 1328.62033
[35] Peel, D.; McLachlan, GJ, Robust mixture modelling using the \(t\) distribution, Stat. Comput., 10, 339-348 (2000)
[36] Punzo, A.; McNicholas, PD, Parsimonious mixtures of multivariate contaminated normal distributions, Biomet. J., 58, 1506-1537 (2016) · Zbl 1353.62124
[37] Punzo, A.; Mazza, A.; McNicholas, PD, Contaminatedmixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions, J. Stat. Softw., 85, 1-25 (2018)
[38] Riani, M.; Perrotta, D.; Torti, F., FSDA: a Matlab toolbox for robust analysis and interactive data exploration, Chemom. Intell. Lab. Syst., 116, 17-32 (2012)
[39] Riani, M.; Cerioli, A.; Perrotta, D.; Torti, F., Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library, Adv. Data Anal. Classif., 9, 461-481 (2015) · Zbl 1414.62267
[40] Riani, M.; Atkinson, A.; Cerioli, A.; Corbellini, A., Efficient robust methods via monitoring for clustering and multivariate data analysis, Pattern Recognit., 88, 246-260 (2019)
[41] Ritter, G., Cluster Analysis and Variable Selection (2014), Boca Raton: CRC Press, Boca Raton
[42] Rocci, R.; Gattone, S.; Di Mari, R., A data driven equivariant approach to constrained Gaussian mixture modeling, Adv. Data Anal. Classif., 12, 235-260 (2018) · Zbl 1414.62269
[43] Rousseeuw, P.; Van Driessen, K., A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41, 212-223 (1999)
[44] Seo, B.; Kim, D., Root selection in normal mixture models, Comput. Stat. Data Anal., 56, 2454-2470 (2012) · Zbl 1252.62013
[45] Zhang, J.; Liang, F., Robust clustering using exponential power mixtures, Biometrics, 66, 1078-1086 (2010) · Zbl 1233.62192
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.