×

An overview of skew distributions in model-based clustering. (English) Zbl 07451362

Summary: The literature on non-normal model-based clustering has continued to grow in recent years. The non-normal models often take the form of a mixture of component densities that offer a high degree of flexibility in distributional shapes. They handle skewness in different ways, most typically by introducing latent ‘skewing’ variable(s), while some other consider marginal transformations of the original variable(s). We provide a selective overview of the main types of skew distributions used in the area, based on their characterization of skewness, and discuss different skew shapes they can produce. For brevity, we focus on the more commonly-used families of distributions.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F86 Parametric inference and fuzziness
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Abanto-Valle, C. A.; Lachos, V. H.; Dey, D. K., Bayesian estimation of a skew-student-\(t\) stochastic volatility model, Methodol. Comput. Appl. Probab., 17, 721-738 (2015) · Zbl 1327.62128
[2] Adcock, C., Copulaesque versions of the skew-normal and skew-student distributions, Symmetry, 13, 815 (2021)
[3] Adcock, C.; Azzalini, A., A selective overview of skew-elliptical and related distributions and of their applications, Symmetry, 12, 1-38 (2020)
[4] Arellano-Valle, R. B.; Ferreira, C. S.; Genton, M. G., Scale and shape mixtures of multivariate skew-normal distributions, J. Multivariate Anal., 166, 98-110 (2018) · Zbl 06869753
[5] Arellano-Valle, R. B.; Genton, M. G., On fundamental skew distributions, J. Multivariate Anal., 96, 93-116 (2005) · Zbl 1073.62049
[6] Asparouhov, T.; Muthén, B., Structural equation models and mixture models with continuous non-normal skewed distributions, Struct. Equ. Model. (2015)
[7] Azzalini, A., The skew-normal distribution and related multivariate families, Scand. J. Stat., 32, 159-188 (2005) · Zbl 1091.62046
[8] Azzalini, A.; Capitanio, A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution, J. R. Statist. Soc. B, 65, 367-389 (2003) · Zbl 1065.62094
[9] Azzalini, A.; Capitanio, A., The Skew-Normal and Related Families, Institute of Mathematical Statistics Monographs (2014), Cambridge University Press: Cambridge University Press UK · Zbl 0924.62050
[10] Azzalini, A.; Dalla Valle, A., The multivariate skew-normal distribution, Biometrika, 83, 715-726 (1996) · Zbl 0885.62062
[11] Babić, S.; Ley, C.; Veredas, D., Comparison and classification of flexible distributions for multivariate skew and heavy-tailed data, Symmetry, 11, 1216 (2019)
[12] Bickel, P. J.; Doksum, K. A., An analysis of transformations revisited, J. Amer. Statist. Assoc., 76, 296-311 (1981) · Zbl 0464.62058
[13] Box, G. E.; Cox, D. R., An analysis of transformations, J. R. Stat. Soc. Ser. B Stat. Methodol., 2, 211-252 (1964) · Zbl 0156.40104
[14] Branco, M. D.; Dey, D. K., A general class of multivariate skew-elliptical distributions, J. Multivariate Anal., 79, 99-113 (2001) · Zbl 0992.62047
[15] Browne, R. P.; Dang, U. J.; Gallaugher, M. P.; McNicholas, P. D., mixSPE: Mixtures of Power Exponential and Skew Power Exponential Distributions for Use in Model-Based Clustering and ClassificationR package version 0.9.1 (2021)
[16] Browne, R.; McNicholas, P., A mixture of generalized hyperbolic distributions, Canad. J. Statist., 43, 176-198 (2015) · Zbl 1320.62144
[17] Cabral, C. R.; Lachos, V. H.; Prates, M., Multivariate mixture mdoeling using skew-normal independent distributions, Comput. Statist. Data Anal., 56, 126-142 (2012) · Zbl 1239.62058
[18] Cabral, C. R.B.; Lachos, V. H.; Prates, M. O., Multivariate mixture modeling using skew-normal independent distributions, Comput. Statist. Data Anal., 56, 126-142 (2012) · Zbl 1239.62058
[19] Contreras-Reyes, J. E.; Arellano-Valle, R. B., Growth estimates of cardinalfish (epigonus crassicaudus) based on scale mixtures of skew-normal distributions, Fish. Res., 147, 137-144 (2013)
[20] Dávila, V.; Cabral, C.; Zeller, C., Finite Mixture of Skewed Distributions (2018), Springer: Springer Berlin · Zbl 1428.62006
[21] Forbes, F.; Wraith, D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering, Stat. Comput., 24, 971-984 (2014) · Zbl 1332.62204
[22] Franczak, B.; Browne, R.; McNicholas, P., Mixtures of shifted asymmetric Laplace distributions, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014), 1149-1157
[23] Frühwirth-Schnatter, S.; Pyne, S., Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions, Biostatistics, 11, 317-336 (2010) · Zbl 1437.62465
[24] (Genton, M. G., Skew-Elliptical Distributions and their Applications: A Journey Beyond Normality (2004), Chapman & Hall, CRC) · Zbl 1069.62045
[25] Gomez, H. W.; Venegas, O.; Bolfarine, H., Skew-symmetric distributions generated by the distribution function of the normal distribution, Environmetrics, 18, 395-407 (2007)
[26] Gupta, A. K., Multivariate skew-\(t\) distribution, Statistics, 37, 359-363 (2003) · Zbl 1037.62045
[27] Jajuga, K.; Palpa, D., Copula functions in model based clustering, Data Inf. Anal. Knowl. Eng., 60, 606-613 (2006)
[28] Jones, M. C., On families of distributions with shape parameters (with discussions), Internat. Statist. Rev., 83, 175-192 (2015)
[29] Jones, M. C.; Pewsey, A., Sinh-arcsinh distributions, Biometrika, 96, 761-780 (2009) · Zbl 1183.62019
[30] Karlis, D.; Santourian, A., Model-based clustering with non-elliptically contoured distributions, Stat. Comput., 19, 73-83 (2009)
[31] Kollo, T., Multivariate skewness and kurtosis measures with an application to ICA, J. Multivariate Anal., 79, 99-113 (2008) · Zbl 1294.62021
[32] onu Kollo, T.; Käärik, M.; Selart, A., Multivariate skew \(t\)-distribution: Asymptotics for parameter estimators and extension to skew \(t\)-copula, Symmetry, 13, 1059 (2021)
[33] Kollo, T.; Pettere, G., Parameter estimation and application of the multivariate skew t-copula, (Jaworski, P.; Durante, F.; Härdle, W. K.; Rychlik, T., Copula Theory and its Applications (2010), Springer: Springer Berlin/Heidelberg, Berlin), 289-298
[34] Kollo, T.; Srivastava, M. S., Estimation and testing of parameters in multivariate Laplace distribution, Comm. Statist. Theory Methods, 33, 2363-2387 (2007) · Zbl 1217.62080
[35] Kosmidis, I.; Karlis, D., Model-based clustering using copulas with applications, Stat. Comput., 26, 1079-1099 (2016) · Zbl 06652996
[36] Lachos, V. H.; Ghosh, P.; Arellano-Valle, R. B., Likelihood based inference for skew normal independent linear mixed models, Statist. Sinica, 20, 303-322 (2010) · Zbl 1186.62071
[37] Lee, S. X.; Lin, T.-I.; McLachlan, G. J., Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions, Adv. Data Anal. Classif., 15, 481-512 (2021) · Zbl 07363882
[38] Lee, S. X.; McLachlan, G. J., EMMIX-uskew: An R package for fitting mixtures of multivariate skew \(t\)-distributions via the EM algorithm, J. Stat. Softw., 55, 1-22 (2013)
[39] Lee, S. X.; McLachlan, G. J., Model-based clustering and classification with non-normal mixture distributions, Stat. Methods Appl., 22, 427-454 (2013) · Zbl 1332.62209
[40] Lee, S. X.; McLachlan, G. J., On mixtures of skew-normal and skew \(t\)-distributions, Adv. Data Anal. Classif., 7, 241-266 (2013) · Zbl 1273.62115
[41] Lee, S.; McLachlan, G., Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results, Stat. Comput., 24, 181-202 (2014) · Zbl 1325.62107
[42] Lee, S. X.; McLachlan, G. J., Finite mixtures of canonical fundamental skew \(t\)-distributions: The unification of the restricted and unrestricted skew \(t\)-mixture models, Stat. Comput., 26, 573-589 (2016) · Zbl 1420.60020
[43] Lee, S. X.; McLachlan, G. J., EMMIXcskew: An R package for the fitting of a mixture of canonical fundamental skew \(t\)-distributions, J. Stat. Softw., 83, 3, 1-32 (2018)
[44] Lee, S. X.; McLachlan, G. J., On formulations of skew factor models: skew factors and/or skew errors, Statist. Probab. Lett., 168, Article 108935 pp. (2021) · Zbl 1456.62036
[45] Ley, C., Flexible modelling in statistics: past, present and future, J. Soc. Française Statist., 156, 76-79 (2015) · Zbl 1316.62023
[46] Ley, C.; Paindaveine, D., Multivariate skewing mechanisms: A unified perspective based on the transformation approach, Statist. Probab. Lett., 80, 1685-1694 (2010) · Zbl 1219.60009
[47] Lin, T.-I., Maximum likelihood estimation for multivariate skew normal mixture models, J. Multivariate Anal., 100, 257-265 (2009) · Zbl 1152.62034
[48] Lin, T. I., Robust mixture modeling using multivariate skew-\(t\) distribution, Stat. Comput., 20, 343-356 (2010)
[49] Lin, T. I.; Ho, H. J.; Lee, C. R., Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution, Stat. Comput., 24, 531-546 (2014) · Zbl 1325.62113
[50] Lin, T. I.; McLachlan, G. J.; Lee, S. X., Extending mixtures of factor models using the restricted multivariate skew-normal distribution, J. Multivariate Anal., 143, 398-413 (2016) · Zbl 1328.62378
[51] Lin, T. I.; Wang, W. L.; McLachlan, G.; Lee, S., Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution, Statist. Model., 18, 50-72 (2018) · Zbl 07289498
[52] Lo, K.; Gottardo, R., Flexible mixture modeling via the multivariate \(t\) distribution with the Box-Cox transformation: an alternative to the skew-\(t\) distribution, Statist. Comput., 22, 33-52 (2012) · Zbl 1322.62173
[53] Lo, K.; Hahne, F.; Brinkman, R. R.; Gottardo, R., Flowclust: a bioconductor package for automated gating of flow cytometry data, BMC Bioinform., 10, 145 (2009)
[54] Loperfido, N., Skewness-based projection pursuit: A computational approach, Comput. Statist. Data Anal., 120, 42-57 (2018) · Zbl 1469.62111
[55] Mahdavi, A.; Amirzadeh, V.; Jamalizadeh, A.; Lin, T.-I., A multivariate flexible skew-symmetric-normal distribution: Scale-shape mixtures and parameter estimation via selection representation, Symmetry, 13, 1343 (2021)
[56] Maleki, M.; McLachlan, G. J.; Lee, S. X., Robust clustering based on finite mixture of multivariate fragmental distributions, Statist. Model. (2020), (in press)
[57] Maleki, M.; Wraith, D., Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework, Comput. Statist., 34, 1039-1053 (2019) · Zbl 07095925
[58] Maleki, M.; Wraith, D.; Arellano-Valle, R. B., Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions, Stat. Comput., 29, 415-428 (2019) · Zbl 1430.62105
[59] Manly, B. F.J., Exponential data transformations, J. R. Stat. Soc. Ser. B Stat. Methodol., 25, 37-42 (1976)
[60] McLachlan, G. J.; Lee, S. X., Comment on “on nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas, Statist. Probaility Lett., 116, 1-5 (2016) · Zbl 1376.60029
[61] McNicholas, S. M.; McNicholas, P. D.; Browne, R. P., A mixture of variance-gamma factor analyzers, (Big and Complex Data Analysis: Methodology and Applications (2017), Springer International Publishing: Springer International Publishing Cham), 365-385 · Zbl 1381.62187
[62] Melnykov, Y.; Zhu, X.; Melnykov, V., Transformation mixture modeling for skewed data groups with heavy tails and scatter, Comput. Statist., 36, 61-78 (2021) · Zbl 07315547
[63] Murray, P.; Browne, B.; McNicholas, P., Mixtures of skew-\(t\) factor analyzers, Comput. Statist. Data Anal., 77, 326-335 (2014) · Zbl 06984029
[64] Murray, P. M.; Browne, R. P.; McNicholas, P. D., Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering, J. Multivariate Anal., 161, 141-156 (2017) · Zbl 1403.62028
[65] Murray, P. M.; Browne, R. P.; McNicholas, P. D., Mixtures of hidden truncation hyperbolic factor analyzers, J. Classification, 34, 366-379 (2020) · Zbl 07223606
[66] Negarestani, H.; Jamalizadeh, A.; Shafiei, S.; Balakrishnan, N., Mean mixtures of normal distributions: properties, inference and application, Metrika, 82, 501-528 (2019) · Zbl 1481.60026
[67] Prates, M.; Lachos, V.; Cabral, C., mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal DistributionsR package version 1.0-9 (2013)
[68] Pyne, S.; Hu, X.; Wang, K.; Rossin, E.; Lin, T.-I.; Maier, L. M.; Baecher-Allan, C.; McLachlan, G. J.; Tamayo, P.; Hafler, D. A.; De Jager, P. L.; Mesirow, J. P., Automated high-dimensional flow cytometric data analysis, Proc. Natl. Acad. Sci. USA, 106, 8519-8524 (2009)
[69] Pyne, S.; Lee, S.; McLachlan, G., Nature and man: The goal of bio-security in the course of rapid and inevitable human development, J. Indian Soc. Agric. Statist., 69, 117-125 (2015)
[70] Riggi, S.; Ingrassia, S., A model-based clustering approach for mass composition analysis of high energy cosmic rays, Astropart. Phys., 48, 86-96 (2013)
[71] Sahu, S. K.; Dey, D. K.; Branco, M. D., A new class of multivariate skew distributions with applications to Bayesian regression models, Canad. J. Statist., 31, 129-150 (2003) · Zbl 1039.62047
[72] Schaarschmidt, F.; Hofmann, M.; Jaki, T.; Grün, B.; Hothorn, L. A., Statistical approaches for the determination of cut points in anti-drug antibody bioassays, J. Immunol. Methods, 25, 295-306 (2015)
[73] Schweizer, B.; Sklar, A., Probabilistic Metric Spaces (1983), North-Holland: North-Holland New York · Zbl 0546.60010
[74] Sklar, A., Fonctions de Rápartition à n Dimensions et Leurs Marges, Vol. 8, 229-231 (1959), Publications de L’Institut Statistique de L’Université de Paris
[75] Spurek, P., General split Gaussian cross entropy clustering, Expert Syst. Appl., 68, 58-68 (2017)
[76] Tortora, C.; Browne, R. P.; Franczak, B. C.; McNicholas, P. D., MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic DistributionsR package version 1.7 (2015)
[77] Tukey, J. W., Modern Techniques in Data Analysis (1977), Southeastern Massachusetts University: Southeastern Massachusetts University Dartmouth, MA
[78] Villani, M.; Larsson, R., The multivariate split normal distribution and asymmetric principal components analysis, Comm. Statist. Theory Methods, 35, 1123-1140 (2006) · Zbl 1102.62064
[79] Vrac, M.; Billard, L.; Diday, E.; Chèdin, A., Copula analysis of mixture models, Comput. Statist., 27, 427-457 (2012) · Zbl 1304.65087
[80] Wang, K.; McLachlan, G. J.; Ng, S.-K.; Peel, D., EMMIXskew: EM Algorithm for Mixture of Multivariate Skew Normal/\(t\) DistributionsR package version 1.0.20 (2009)
[81] Weibel, M.; Luethi, D.; Breymann, W., ghyp: Generalized Hyperbolic Distribution and Its Special CasesR Package Version 1.6.1 (2020)
[82] Wraith, D.; Forbes, F., Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering, Comput. Statist. Data Anal., 90, 61-73 (2015) · Zbl 1468.62210
[83] Yeo, I. K.; Johnson, R. A., A new family of power transformations to improve normality or symmetry, Biometrika, 87, 954-959 (2000) · Zbl 1028.62010
[84] Zhang, L.; Baek, J., Mixtures of Gaussian copula factor analyzers for clustering high dimensional data, J. Korean Stat. Soc., 48, 480-492 (2019) · Zbl 1428.62301
[85] Zhu, X.; Melnykov, V., Manlymix: An R package for Manly mixture modeling, R J., 9, 176-197 (2017)
[86] Zhu, X.; Melnykov, V., Manly transformation in finite mixture modeling, Comput. Statist. Data Anal., 121, 190-208 (2018) · Zbl 1469.62184
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.