Vrbik, Irene; McNicholas, Paul D. Parsimonious skew mixture models for model-based clustering and classification. (English) Zbl 1471.62202 Comput. Stat. Data Anal. 71, 196-210 (2014). Summary: Robust mixture modeling approaches using skewed distributions have recently been explored to accommodate asymmetric data. Parsimonious skew-\(t\) and skew-normal analogues of the GPCM family that employ an eigenvalue decomposition of a scale matrix are introduced. The methods are compared to existing models in both unsupervised and semi-supervised classification frameworks. Parameter estimation is carried out using the expectation-maximization algorithm and models are selected using the Bayesian information criterion. The efficacy of these extensions is illustrated on simulated and real data sets. Cited in 26 Documents MSC: 62-08 Computational methods for problems pertaining to statistics 62H30 Classification and discrimination; cluster analysis (statistical aspects) Keywords:eigenvalue decomposition; EM algorithm; GPCM; MCLUST; mixture models; model-based clustering; skew-normal distribution; skew-\(t\) distribution Software:teigen; mixture; R; PARVUS; mclust PDFBibTeX XMLCite \textit{I. Vrbik} and \textit{P. D. McNicholas}, Comput. Stat. Data Anal. 71, 196--210 (2014; Zbl 1471.62202) Full Text: DOI arXiv References: [1] Aitken, A. C., On bernoulli’s numerical solution of algebraic equations, Proceedings of the Royal Society of Edinburgh, 46, 289-305, (1926) · JFM 52.0098.05 [2] Anderson, E., The irises of the gaspé peninsula, Bulletin of the American Iris Society, 59, 2-5, (1935) [3] Andrews, J. L.; McNicholas, P. D., Extending mixtures of multivariate t-factor analyzers, Statistics and Computing, 21, 3, 361-373, (2011) · Zbl 1255.62175 [4] Andrews, J. L.; McNicholas, P. D., Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis, Journal of Statistical Planning and Inference, 141, 4, 1479-1486, (2011) · Zbl 1204.62098 [5] Andrews, J. L.; McNicholas, P. D., Model-based clustering, classification, and discriminant analysis via mixtures of multivariate \(t\)-distributions, Statistics and Computing, 22, 5, 1021-1029, (2012) · Zbl 1252.62062 [6] Andrews, J.L., McNicholas, P.D., 2012b. teigen: model-based clustering and classification with the multivariate \(t\)-distribution. R Package Version 1.0. [7] Andrews, J. L.; McNicholas, P. D.; Subedi, S., Model-based classification via mixtures of multivariate \(t\)-distributions, Computational Statistics and Data Analysis, 55, 1, 520-529, (2011) · Zbl 1247.62151 [8] Baek, J.; McLachlan, G. J., Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1298-1309, (2010) [9] Baek, J.; McLachlan, G. J., Mixtures of common t-factor analyzers for clustering high-dimensional microarray data, Bioinformatics, 27, 9, 1269-1276, (2011) [10] Banfield, J. D.; Raftery, A., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034 [11] Baudry, J. P.; Raftery, A. E.; Celeux, G.; Lo, K.; Gottardo, R., Combining mixture components for clustering, Journal of Computational and Graphical Statistics, 19, 2, 332-353, (2010) [12] Bensmail, H.; Celeux, G., Regularized Gaussian discriminant analysis through eigenvalue decomposition, Journal of the American Statistical Association, 91, 1743-1748, (1996) · Zbl 0885.62068 [13] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 7, 719-725, (2000) [14] Böhning, D.; Dietz, E.; Schaub, R.; Schlattmann, P.; Lindsay, B., The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family, Annals of the Institute of Statistical Mathematics, 46, 373-388, (1994) · Zbl 0802.62017 [15] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional data clustering, Computational Statistics and Data Analysis, 52, 1, 502-519, (2007) · Zbl 1452.62433 [16] Browne, R. P.; McNicholas, P. D., Model-based clustering, classification, and discriminant analysis of data with mixed type, Journal of Statistical Planning and Inference, 142, 11, 2976-2984, (2012) · Zbl 1335.62093 [17] Browne, R. P.; McNicholas, P. D., Estimating common principal components in high dimensions, Advances in Data Analysis and Classification, (2013), (in press) [18] Browne, R.P., McNicholas, P.D., 2013. Mixture: mixture models for clustering and classification. R Package Version 1.0. · Zbl 1332.62215 [19] Cabral, C.; Lachos, V.; Prates, M., Multivariate mixture modeling using skew-normal independent distributions, Computational Statistics and Data Analysis, 56, 1, 126-142, (2012) · Zbl 1239.62058 [20] Campbell, N. A.; Mahon, R. J., A multivariate study of variation in two species of rock crab of genus leptograpsus, Australian Journal of Zoology, 22, 417-425, (1974) [21] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognition, 28, 781-793, (1995) [22] Dasgupta, A.; Raftery, A. E., Detecting features in spatial point processes with clutter via model-based clustering, Journal of the American Statistical Association, 93, 294-302, (1998) · Zbl 0906.62105 [23] Dean, N.; Murphy, T. B.; Downey, G., Using unlabelled data to update classification rules with applications in food authenticity studies, Journal of the Royal Statistical Society: Series C, 55, 1, 1-14, (2006) · Zbl 1490.62155 [24] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39, 1, 1-38, (1977) · Zbl 0364.62022 [25] Fisher, R. A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, Part II, 179-188, (1936) [26] Flury, B.; Riedwyl, H., Multivariate statistics: A practical approach, (1988), Chapman and Hall London [27] Forina, M.; Leardi, R.; Armanino, C.; Lanteri, S.; Vandeginste, B., Parvus: an extendable package of programs for data exploration, classification and correlation, Journal of Chemometrics, 4, 2, 191-193, (1988) [28] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 458, 611-631, (2002) · Zbl 1073.62545 [29] Fraley, C., Raftery, A.E., Scrucca, L., 2012. MCLUST: normal mixture modeling for model-based clustering, classification, and density estimation. R Package Version 4.0. [30] Franczak, B., Browne, R.P., McNicholas, P.D., 2012. Mixtures of shifted asymmetric Laplace distributions. ArXiv Preprint arXiv:1207.1727v3. [31] Frühwirth-Schnatter, S.; Pyne, S., Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions, Biostatistics, 11, 2, 317-336, (2010) [32] Ghahramani, Z.; Hinton, G. E., The EM algorithm for factor analyzers, technical report CRG-TR-96-1, (1997), University of Toronto Toronto [33] Greselin, F.; Ingrassia, S., Constrained monotone EM algorithms for mixtures of multivariate \(t\) distributions, Statistics and Computing, 20, 9-22, (2010) [34] Ho, H. J.; Pyne, S.; Lin, T. I., Maximum likelihood inference for mixtures of skew student-\(t\)-normal distributions through practical EM-type algorithms, Statistics and Computing, 22, 1, 287-299, (2012) · Zbl 1322.62087 [35] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218, (1985) [36] Hunter, D. L.; Lange, K., A tutorial on MM algorithms, The American Statistician, 58, 1, 30-37, (2004) [37] Karlis, D.; Santourian, A., Model-based clustering with non-elliptically contoured distributions, Statistics and Computing, 19, 1, 73-83, (2009) [38] Keribin, C., Consistent estimation of the order of mixture models, Sankhyā: The Indian Journal of Statistics. Series A, 62, 1, 49-66, (2000) · Zbl 1081.62516 [39] Kotz, S.; Nadarajah, S., Multivariate \(t\) distributions and their applications, (2004), Cambridge University Press · Zbl 1100.62059 [40] Lee, S.; McLachlan, G., Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results, Statistics and Computing, (2012), (in press) [41] Lin, T. I.; Lee, J. C.; Yen, S. Y., Finite mixture modelling using the skew normal distribution, Statistica Sinica, 17, 909-927, (2007) · Zbl 1133.62012 [42] Lindsay, B. G., Mixture models: theory, geometry and applications, (NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 5, (1995), Institute of Mathematical Statistics California, Hayward) · Zbl 1163.62326 [43] McLachlan, G. J.; Basford, K., Mixture models: inference and applications to clustering, (1988), Marcel Dekker New York · Zbl 0697.62050 [44] McLachlan, G.; Bean, R.; Ben-Tovim Jones, L., Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\) distribution, Computational Statistics and Data Analysis, 51, 5327-5338, (2007) · Zbl 1445.62053 [45] McLachlan, G. J.; Peel, D., (Robust cluster analysis via mixtures of multivariate \(t\)-distributions, Lecture Notes in Computer Science, vol. 1451, (1998), Springer-Verlag Berlin), 658-666 [46] McLachlan, G.J., Peel, D., 2000. Mixtures of factor analyzers. In: Seventh International Conference on Machine Learning, San Francisco. · Zbl 1256.62036 [47] McNicholas, P. D., Model-based classification using latent Gaussian mixture models, Journal of Statistical Planning and Inference, 140, 5, 1175-1181, (2010) · Zbl 1181.62095 [48] McNicholas, P. D.; Murphy, T. B., Parsimonious Gaussian mixture models, Statistics and Computing, 18, 285-296, (2008) [49] McNicholas, P. D.; Murphy, T. B., Model-based clustering of longitudinal data, Canadian Journal of Statistics, 38, 1, 153-168, (2010) · Zbl 1190.62120 [50] McNicholas, P. D.; Murphy, T. B., Model-based clustering of microarray expression data via latent Gaussian mixture models, Bioinformatics, 26, 21, 2705-2712, (2010) [51] McNicholas, P. D.; Subedi, S., Clustering gene expression time course data using mixtures of multivariate \(t\)-distributions, Journal of Statistical Planning and Inference, 142, 5, 1114-1127, (2012) · Zbl 1236.62068 [52] Morris, K.; McNicholas, P. D., Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions, Statistics and Probability Letters, 83, 9, 2088-2093, (2013) · Zbl 1282.62153 [53] Morris, K.; McNicholas, P. D.; Scrucca, L., Dimension reduction for model-based clustering via mixtures of multivariate \(t\)-distributions, Advances in Data Analysis and Classification, (2013), (in press) · Zbl 1273.62141 [54] Peel, D.; McLachlan, G. J., Robust mixture modelling using the \(t\) distribution, Statistics and Computing, 10, 4, 339-348, (2000) [55] Pyne, S.; Hu, X.; Wang, K.; Rossin, E.; Lin, T. I.; Maier, L. M.; Baecher-Allan, C.; McLachlan, G. J.; Tamayo, P.; Hafler, D. A.; De Jager, P. L.; Mesirov, J. P., Automated high-dimensional flow cytometric data analysis, Proceedings of the National Academy of Sciences of the United States of America, 106, 8519-8524, (2009) [56] R Core Team, 2013. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. [57] Rand, W. M., Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66, 846-850, (1971) [58] Sahu, S. K.; Dey, D.; Branco, M., A new class of multivariate skew distributions with application to Bayesian regression models, Canadian Journal of Statistics, 31, 129-150, (2003) · Zbl 1039.62047 [59] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464, (1978) · Zbl 0379.62005 [60] Steane, M. A.; McNicholas, P. D.; Yada, R., Model-based classification via mixtures of multivariate t-factor analyzers, Communications in Statistics—Simulation and Computation, 41, 4, 510-523, (2012) · Zbl 1294.62142 [61] Tipping, T. E.; Bishop, C. M., Mixtures of probabilistic principal component analysers, Neural Computation, 11, 2, 443-482, (1999) [62] Ueda, N.; Nakano, R., Deterministic annealing EM algorithm, Neural Networks, 11, 271-282, (1998) [63] Vrbik, I.; McNicholas, P. D., Analytic calculations for the EM algorithm for multivariate skew-\(t\) mixture models, Statistics and Probability Letters, 82, 6, 1169-1174, (2012) · Zbl 1244.65012 [64] Wang, K.; Ng, S.; McLachlan, G., Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data, (Shi, H.; Zhang, Y.; Bottema, M.; Lovell, B.; Maede, A., Digital Image Computing: Techniques and Applications, 2009. DICTA’09, (2009), IEEE), 526-531 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.