×

Constrained parsimonious model-based clustering. (English) Zbl 1477.62006

Summary: A new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints. The methodology includes the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. This is done in a natural way by filling the gap among models and providing a smooth transition among them. The methodology provides mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions. Novel information criteria are proposed to help the user in choosing parameters. The interest of the proposed methodology is illustrated through simulation studies and a real-data application on COVID data.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

TCLUST; FSDA; mixture; mclust
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Banfield, JD; Raftery, AE, Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[2] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans. Pattern. Anal. Mach. Intell., 22, 719-725 (2000)
[3] Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput. Stat. Data Anal. 41, 561-575 (2003) · Zbl 1429.62235
[4] Browne, R.; McNicholas, P., Estimating common principal components in high dimensions, Adv. Data. Anal. Classif., 8, 217-226 (2014) · Zbl 1474.62183
[5] Browne, RP; ElSherbiny, A.; McNicholas, PD, mixture: mixture models for clustering and classification, R Package Version, 1, 5 (2018)
[6] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit., 28, 781-793 (1995)
[7] Cerioli, A.; García-Escudero, LA; Mayo-Iscar, A.; Riani, M., Finding the number of normal groups in model-based clustering via constrained likelihoods, J. Comput. Graph. Stat., 27, 404-416 (2018)
[8] Day, N., Estimating the components of a mixture of normal distributions, Biometrika, 56, 463-474 (1969) · Zbl 0183.48106
[9] Fritz, H.; García-Escudero, LA; Mayo-Iscar, A., A fast algorithm for robust constrained clustering, Comput. Stat. Data Anal., 61, 124-136 (2013) · Zbl 1349.62264
[10] Gallegos, MT; Ritter, G., Probabilistic clustering via pareto solutions and significance tests, Adv. Data Anal. Classif., 12, 179-202 (2018) · Zbl 1414.62243
[11] García-Escudero, LA; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A., Avoiding spurious local maximizers in mixture modeling, Stat. Comput., 25, 619-633 (2015) · Zbl 1331.62100
[12] García-Escudero, LA; Gordaliza, A.; Greselin, F.; Ingrassia, S.; Mayo-Iscar, A., Eigenvalues and constraints in mixture modeling: geometric and computational issues, Adv. Data Anal. Classif., 12, 203-233 (2018) · Zbl 1414.62071
[13] García-Escudero, LA; Mayo-Iscar, A.; Riani, M., Model-based clustering with determinant-and-shape constraint, Stat. Comput., 25, 1-18 (2020) · Zbl 1452.62443
[14] Hathaway, R., A constrained formulation of maximum likelihood estimation for normal mixture distributions, Ann. Stat., 13, 795-800 (1985) · Zbl 0576.62039
[15] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[16] Kiefer, J.; Wolfowitz, J., Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters, Ann. Math. Stat., 27, 887-906 (1956) · Zbl 0073.14701
[17] Maitra, R.; Melnykov, V., Simulating data to study performance of finite mixture modeling and clustering algorithms, J. Comput. Graph. Stat., 19, 354-376 (2010)
[18] McLachlan, G.; Peel, D., Finite Mixture Models (2000), New York: Wiley Series in Probability and Statistics, New York
[19] Meng, X.; Rubin, D., Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[20] Riani, M.; Perrotta, D.; Torti, F., FSDA: a Matlab toolbox for robust analysis and interactive data exploration, Chemometr. Intell. Lab. Syst., 116, 17-32 (2012)
[21] Riani, M.; Cerioli, A.; Perrotta, D.; Torti, F., Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library, Adv. Data Anal. Classif., 9, 2015 (2015) · Zbl 1414.62267
[22] Scrucca, L.; Fop, M.; Murphy, TB; Raftery, AE, mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J., 8, 1, 289-317 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.