×

Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions. (English) Zbl 07363882

Summary: Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors were relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the distribution of the factors/errors in most of these models is typically limited to modelling skewness concentrated in a single direction. Here, we introduce a more flexible finite mixture of factor analyzers based on the class of scale mixtures of canonical fundamental skew normal (SMCFUSN) distributions. This very general class of skew distributions can capture various types of skewness and asymmetry in the data. In particular, the proposed mixtures of SMCFUSN factor analyzers (SMCFUSNFA) can simultaneously accommodate multiple directions of skewness. As such, it encapsulates many commonly used models as special and/or limiting cases, such as models of some versions of skew normal and skew \(t\)-factor analyzers, and skew hyperbolic factor analyzers. For illustration, we focus on the \(t\)-distribution member of the class of SMCFUSN distributions, leading to mixtures of canonical fundamental skew \(t\)-factor analyzers (CFUSTFA). Parameter estimation can be carried out by maximum likelihood via an EM-type algorithm. The usefulness and potential of the proposed model are demonstrated using four real datasets.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

R; MixGHD; mixture; UCI-ml; sn
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Arellano-Valle, RB; Azzalini, A., On the unification of families of skew-normal distributions, Scand J Stat, 33, 561-574 (2006) · Zbl 1117.62051
[2] Arellano-Valle, RB; Genton, MG, On fundamental skew distributions, J Multivar Anal, 96, 93-116 (2005) · Zbl 1073.62049
[3] Azzalini, A.; Capitanio, A., The Skew-Normal and Related Families (2014), Cambridge: Cambridge University Press, Cambridge · Zbl 0924.62050
[4] Azzalini, A.; Dalla Valle, A., The multivariate skew-normal distribution, Biometrika, 83, 715-726 (1996) · Zbl 0885.62062
[5] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans Pattern Anal Mach Intell, 22, 719-725 (2000)
[6] Browne, RP; McNicholas, PD, A mixture of generalized hyperbolic distributions, Can J Stat, 43, 176-198 (2015) · Zbl 1320.62144
[7] Cabral, CRB; Lachos, VH; Prates, MO, Multivariate mixture modeling using skew-normal independent distributions, Comput Stat Data Anal, 56, 126-142 (2012) · Zbl 1239.62058
[8] Codella N, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza S, Kalloo A, Liopyris K, Mishra N, Kittler H, Halpern A (2017) Skin lesion analysis toward melanoma detection: A challenge at the 2017 In: International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1710.05006
[9] Cook, RD; Weisberg, S., An Introduction to Regression Graphics (1994), New York: Wiley, New York · Zbl 0925.62287
[10] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J Royal Stat Soc B, 39, 1-38 (1977) · Zbl 0364.62022
[11] Ferris, LK; Harkes, JA; Gilbert, B.; Winger, DG; Golubets, K.; Akilov, O.; Satyanarayanan, M., Computer-aided classification of melanocytic lesions using dermoscopic images, J Am Acad Dermatol, 73, 769-776 (2015)
[12] Forina, M.; Tiscornia, E., Pattern recognition methods in the prediction of italian olive oil origin by their fatty acid content, Annali di Chimica, 72, 143-155 (1982)
[13] Genton MG (ed) (2004) Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall, CRC, Boca Raton, Florida · Zbl 1069.62045
[14] Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical Report No CRG-TR-96-1 The University of Toronto: Toronto
[15] Ho, HJ; Lin, TI; Chen, HY; Wang, WL, Some results on the truncated multivariate \(t\) distribution, J Stat Plan Inference, 142, 25-40 (2012) · Zbl 1229.62068
[16] Hubert, L.; Arabie, P., Comparing partitions, J Classif, 2, 193-218 (1985) · Zbl 0587.62128
[17] Karlis, D.; Santourian, A., Model-based clustering with non-elliptically contoured distributions, Stat Comput, 19, 73-83 (2009)
[18] Kim, HM; Maadooliat, M.; Arellano-Valle, RB; Genton, MG, Skewed factor models using selection mechanisms, J Multivar Anal, 145, 162-177 (2016) · Zbl 1331.62296
[19] Kim, SG, An approximate fitting for mixture of multivariate skew normal distribution via EM algorithm, Korean J Appl Stat, 29, 513-523 (2016)
[20] Lee, S.; McLachlan, GJ, Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results, Stat Comput, 24, 181-202 (2014) · Zbl 1325.62107
[21] Lee, SX; McLachlan, GJ, On mixtures of skew-normal and skew \(t\)-distributions, Adv Data Anal Classif, 7, 241-266 (2013) · Zbl 1273.62115
[22] Lee, SX; McLachlan, GJ, Finite mixtures of canonical fundamental skew \(t\)-distributions: The unification of the restricted and unrestricted skew \(t\)-mixture models, Stat Comput, 26, 573-589 (2016) · Zbl 1420.60020
[23] Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
[24] Lin, TI, Maximum likelihood estimation for multivariate skew normal mixture models, J Multivar Anal, 100, 257-265 (2009) · Zbl 1152.62034
[25] Lin, TI, Robust mixture modeling using multivariate skew-\(t\) distribution, Stat Comput, 20, 343-356 (2010)
[26] Lin, TI; Wu, PH; McLachlan, GJ; Lee, SX, A robust factor analysis model using the restricted skew \(t\)-distribution, TEST, 24, 510-531 (2015) · Zbl 1327.62344
[27] Lin, TI; McLachlan, GJ; Lee, SX, Extending mixtures of factor models using the restricted multivariate skew-normal distribution, J Multivar Anal, 143, 398-413 (2016) · Zbl 1328.62378
[28] Lin, TI; Wang, WL; McLachlan, GJ; Lee, SX, Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution, Stat Modell, 18, 50-72 (2018) · Zbl 07289498
[29] Maleki, M.; Wraith, D.; Arellano-Valle, RB, Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions, Stat Comput, 29, 425-428 (2019) · Zbl 1430.62105
[30] Maruotti, A.; Bulla, J.; Lagona, F.; Picone, M.; Martella, F., Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures, Ann Appl Stat, 3, 1617-1648 (2017) · Zbl 1380.62265
[31] McLachlan, GJ; Krishnan, T., The EM Algorithm and Extensions (2008), Hoboken, New Jersey: Wiley, Hoboken, New Jersey · Zbl 1165.62019
[32] McLachlan, GJ; Lee, SX, Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P, McNicholas Stat Probab Lett, 116, 1-5 (2016) · Zbl 1376.60029
[33] McLachlan, GJ; Peel, D., Finite Mixture Models (2000), New York: Wiley, New York · Zbl 0963.62061
[34] McLachlan, GJ; Peel, D.; Bean, RW, Modelling high-dimensional data by mixtures of factor analyzers, Comput Stat Data Anal, 41, 379-388 (2003) · Zbl 1256.62036
[35] McLachlan, GJ; Bean, RW; Jones, BT, Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution, Comput Stat Data Anal, 51, 5327-5338 (2007) · Zbl 1445.62053
[36] Meng, X.; Rubin, D., Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[37] Montanari, A.; Viroli, C., A skew-normal factor model for the analysis of student satisfaction towards university courses, J Appl Stat, 37, 463-487 (2010) · Zbl 07252448
[38] Murray, P.; Browne, R.; McNicholas, P., Mixtures of skew-\(t\) factor analyzers, Comput Stat Data Anal, 77, 326-335 (2014) · Zbl 06984029
[39] Murray, P.; McNicholas, P.; Browne, R., Mixtures of common skew-\(t\) factor analyzers, Statistics, 3, 68-82 (2014) · Zbl 06984029
[40] Murray PM (2016) Detecting non-elliptical clusters. PhD thesis, Department of Mathematics & Statistics, McMaster University, Canada
[41] Murray, PM; Browne, RP; McNicholas, PD, Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering, J Multivar Anal, 161, 141-156 (2017) · Zbl 1403.62028
[42] Murray, PM; Browne, RP; McNicholas, PD, A mixture of SDB skew-\(t\) factor analyzers, Econom Stat, 3, 160-168 (2017)
[43] Murray PM, Browne RP, McNicholas PD (2017c) Mixtures of hidden truncation hyperbolic factor analyzers. arXiv:1711.01504 · Zbl 07223606
[44] O’Hagan A (1976) Moments of the truncated multivariate-\(t\) distribution. http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF
[45] Pyne, S.; Hu, X.; Wang, K.; Rossin, E.; Lin, TI; Maier, LM; Baecher-Allan, C.; McLachlan, GJ; Tamayo, P.; Hafler, DA; De Jager, PL; Mesirow, JP, Automated high-dimensional flow cytometric data analysis, Proc National Acad Sci USA, 106, 8519-8524 (2009)
[46] R Core Team (2016) R: A Language and Environment for Statistical Computing. http://www.R-project.org/, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
[47] Rand, WM, Objective criteria for the evaluation of clustering methods, J Am Stat Assoc, 66, 846-850 (1971)
[48] Sahu, SK; Dey, DK; Branco, MD, A new class of multivariate skew distributions with applications to Bayesian regression models, Can J Stat, 31, 129-150 (2003) · Zbl 1039.62047
[49] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464 (1978) · Zbl 0379.62005
[50] Seshadri, V.; Kotz, S.; Read, CB; Banks, DL, Halphen’s laws, Encyclopedia of Statistical Sciences, 302-306 (1997), New York: Wiley, New York
[51] Tortora C, Browne RP, Franczak BC, McNicholas PD (2015) MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions. http://cran.r-project.org/web/packages/MixGHD, r package version 1.7
[52] Tortora, C.; McNicholas, P.; Browne, R., A mixture of generalized hyperbolic factor analyzers, Adv Data Anal Classif, 10, 423-440 (2016) · Zbl 1414.62278
[53] Vinh, NX; Epps, J.; Bailey, J., Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance, J Mach Learn Res, 11, 2227-2240 (2010) · Zbl 1242.62062
[54] Wall, MM; Guo, J.; Amemiya, Y., Mixture factor analysis for approximating a non-normally distributed continuous latent factor with continuous and dichotomous observed variables, Multivar Behav Res, 47, 276-313 (2012)
[55] Yamamoto, H.; Nankaku, Y.; Miyajima, C.; Tokuda, K.; Kitamura, T., Parameter sharing in mixture of factor analyzers for speaker identification, IEICE Trans Inf Syst, 88, 418-424 (2005)
[56] Zhoe, YK; Mobasher, B., Web user segmentation based on a mixture of factor analyzers, Lect Notes Comput Sci, 4082, 11-20 (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.