×

Finite mixtures, projection pursuit and tensor rank: a triangulation. (English) Zbl 1466.62355

Summary: Finite mixtures of multivariate distributions play a fundamental role in model-based clustering. However, they pose several problems, especially in the presence of many irrelevant variables. Dimension reduction methods, such as projection pursuit, are commonly used to address these problems. In this paper, we use skewness-maximizing projections to recover the subspace which optimally separates the cluster means. Skewness might then be removed in order to search for other potentially interesting data structures or to perform skewness-sensitive statistical analyses, such as the Hotelling’s \(T^{2}\) test. Our approach is algebraic in nature and deals with the symmetric tensor rank of the third multivariate cumulant. We also derive closed-form expressions for the symmetric tensor rank of the third cumulants of several multivariate mixture models, including mixtures of skew-normal distributions and mixtures of two symmetric components with proportional covariance matrices. Theoretical results in this paper shed some light on the connection between the estimated number of mixture components and their skewness.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H12 Estimation in multivariate analysis
46N30 Applications of functional analysis in probability theory and statistics
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Adcock, C.; Eling, M.; Loperfido, N., Skewed distributions in finance and actuarial science: a review, Eur J Finance, 21, 1253-1281, (2015)
[2] Ambagaspitiya, RS, On the distributions of two classes of correlated aggregate claims, Insur Math Econ, 24, 301-308, (1999) · Zbl 0945.62110
[3] Arellano-Valle, RB; Genton, MG; Loschi, RH, Shape mixtures of multivariate skew-normal distributions, J Multivar Anal, 100, 91-101, (2009) · Zbl 1151.62042
[4] Azzalini, A.; Capitanio, A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\) distribution, J R Stat Soc B, 65, 367-389, (2003) · Zbl 1065.62094
[5] Azzalini, A.; Genton, MG, Robust likelihood methods based on the skew-t and related distributions, Int Stat Rev, 76, 106-129, (2008) · Zbl 1206.62102
[6] Bartoletti, S.; Loperfido, N., Modelling air pollution data by the skew-normal distribution, Stoch Environ Res Risk Assess, 24, 513-517, (2010)
[7] Blough, DK, Multivariate symmetry and asymmetry, Inst Stat Math, 24, 513-517, (1989)
[8] Bolton, RJ; Krzanowski, WJ, Projection pursuit clustering for exploratory data analysis, J Comput Graph Stat, 12, 121-142, (2003)
[9] Bouveyron, C.; Brunet-Saumard, C., Model-based clustering of high-dimensional data: a review, Comput Stat Data Anal, 71, 52-78, (2014) · Zbl 1471.62032
[10] Branco, MD; Dey, DK, A general class of skew-elliptical distributions, J Multivar Anal, 79, 99-113, (2001) · Zbl 0992.62047
[11] Comon, P., Tensors: a brief introduction, IEEE Sig Process Mag Inst Electr Electron Eng, 31, 44-53, (2014)
[12] Comon, P.; Golub, G.; Lim, L-H; Mourrain, B., Symmetric tensors and symmetric tensor rank, SIAM J Matrix Anal Appl, 30, 1254-1279, (2008) · Zbl 1181.15014
[13] Fraley C, Raftery Adrian E, Scrucca L (2017) mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. https://CRAN.R-project.org/package=mclust. R package version 5.3
[14] Franceschini C, Loperfido N (2017a) MaxSkew: skewness-based projection pursuit. https://CRAN.R-project.org/package=MaxSkew. R package version 1.1
[15] Franceschini C, Loperfido N (2017b) MultiSkew: measures, tests and removes multivariate skewness. https://CRAN.R-project.org/package=MultiSkew. R package version 1.1.1
[16] Friedman, J., Exploratory projection pursuit, J. Am Stat Assoc, 82, 249-266, (1987) · Zbl 0664.62060
[17] Friedman, JH; Tukey, JW, A projection pursuit algorithm for exploratory data analysis, IEEE Trans Comput Ser C, 23, 881-890, (1974) · Zbl 0284.68079
[18] Frühwirth-Schnatter, S.; Pyne, S., Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew\(-t\) distributions, Biostatistics, 11, 317-336, (2010)
[19] Grasman, RPPP; Huizenga, HM; Geurts, HM, Departure from normality in multivariate normative comparison: the Cramé r alternative for Hotelling’s \(T^{2}\), Neuropsychologia, 48, 1510-1516, (2010)
[20] Hennig, C., Asymmetric linear dimension reduction for classification, J Comput Graph Stat, 13, 930-945, (2004)
[21] Hennig, C.; Weihs, C. (ed.); Gaul, W. (ed.), A method for visual cluster validation, 153-160, (2005), Heidelberg
[22] Hui, G.; Lindsay, BG, Projection pursuit via white noise matrices, Sankhya B, 72, 123-153, (2010) · Zbl 1362.62128
[23] Jondeau, E.; Rockinger, M., Optimal portfolio allocation under higher moments, Eur Financ Manag, 12, 29-55, (2006)
[24] Karlis, D.; Xekalaki, E., Choosing initial values for the EM algorithm for finite mixtures, Comput Stat Data Anal, 41, 577-590, (2003) · Zbl 1429.62082
[25] Kim, H-M; Mallick, BK, Moments of random vectors with skew \(t\) distribution and their quadratic forms, Stat Probab Lett, 63, 417-423, (2003) · Zbl 1116.62357
[26] Landsberg, JM; Michalek, M., On the geometry of border rank decompositions for matrix multiplication and other tensors with symmetry, SIAM J Appl Algebra Geom, 1, 2-19, (2017) · Zbl 1365.15034
[27] Lee, S.; McLachlan, GJ, Model-based clustering and classification with non-normal mixture distributions, Stat Methods Appl, 22, 427-454, (2013) · Zbl 1332.62209
[28] Lin XS (2004) Compound distributions. In: Encyclopedia of actuarial science, vol 1. Wiley, pp 314-317
[29] Lindsay, BG; Yao, W., Fisher information matrix: a tool for dimension reduction, projection pursuit, independent component analysis, and more, Can J Stat, 40, 712-730, (2012) · Zbl 1349.62300
[30] Loperfido, N., Generalized skew-normal distributions, 65-80, (2004), Boca Raton
[31] Loperfido, N., Skewness and the linear discriminant function, Stat Probab Lett, 83, 93-99, (2013) · Zbl 06130770
[32] Loperfido, N., Linear transformations to symmetry, J Multivar Anal, 129, 186-192, (2014) · Zbl 1288.62082
[33] Loperfido, N., Vector-valued skewness for model-based clustering, Stat Probab Lett, 99, 230-237, (2015) · Zbl 1396.62145
[34] Loperfido, N., Singular value decomposition of the third multivariate moment, Linear Algebra Appl, 473, 202-216, (2015) · Zbl 1312.15011
[35] Loperfido, N., Skewness-based projection pursuit: a computational approach, Comput Stat Data Anal, 120, 42-57, (2018) · Zbl 1469.62111
[36] Loperfido, N.; Mazur, S.; Podgorski, K., Third cumulant for multivariate aggregate claims models, Scand Actuar J, 2018, 109-128, (2018) · Zbl 1398.62315
[37] Mardia, K., Measures of multivariate skewness and kurtosis with applications, Biometrika, 57, 519-530, (1970) · Zbl 0214.46302
[38] McNicholas, PD, Model-based clustering, J Class, 33, 331-373, (2016) · Zbl 1364.62155
[39] Melnykov, V.; Maitra, R., Finite mixture models and model-based clustering, Stat Surv, 4, 80-116, (2010) · Zbl 1190.62121
[40] Miettinen, J.; Taskinen, S.; Nordhausen, K.; Oja, H., Fourth moments and independent component analysis, Stat Sci, 3, 372-390, (2015) · Zbl 1332.62196
[41] Mòri, T.; Rohatgi, V.; Székely, G., On multivariate skewness and kurtosis, Theory Probab Appl, 38, 547-551, (1993) · Zbl 0807.60020
[42] Morris, K.; McNicholas, PD; Scrucca, L., Dimension reduction for model-based clustering via mixtures of multivariate t-distributions, Adv Data Anal Classif, 7, 321-338, (2013) · Zbl 1273.62141
[43] Oeding, L.; Ottaviani, G., Eigenvectors of tensors and algorithms for Waring decomposition, J Symb Comput, 54, 9-35, (2013) · Zbl 1277.15019
[44] Paajarvi P, Leblanc J (2004) Skewness maximization for impulsive sources in blind deconvolution. In: Proceedings of the 6th Nordic signal processing symposium—NORSIG, Espoo, Finland
[45] Peña, D.; Prieto, FJ, Cluster identification using projections, J Am Stat Assoc, 96, 1433-1445, (2001) · Zbl 1051.62055
[46] Rao CR, Rao MB (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Co. Pte. Ltd, Singapore · Zbl 0915.15001
[47] Sakata T, Sumi T, Miyazaki M (2016) Algebraic and computational aspects of real tensor ranks. Springer, Tokyo · Zbl 1347.13001
[48] Scrucca, L., Dimension reduction for model-based clustering, Stat Comput, 20, 471-484, (2010)
[49] Scrucca, L., Graphical tools for model-based mixture discriminant analysis, Adv Data Anal Classif, 8, 147-165, (2014)
[50] Tarpey, T.; Yun, D.; Petkova, E., Model misspecification: Finite mixture or homogeneous?, Stat Model, 8, 199-218, (2009)
[51] Tyler, DE; Critchley, F.; Dümbgen, L.; Oja, H., Invariant co-ordinate selection, J R Stat Soc B, 71, 1-27, (2009) · Zbl 1250.62032
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.