General location model with factor analyzer covariance matrix structure and its applications. (English) Zbl 1414.62205

Summary: General location model (GLOM) is a well-known model for analyzing mixed data. In GLOM one decomposes the joint distribution of variables into conditional distribution of continuous variables given categorical outcomes and marginal distribution of categorical variables. The first version of GLOM assumes that the covariance matrices of continuous multivariate distributions across cells, which are obtained by different combination of categorical variables, are equal. In this paper, the GLOMs are considered in both cases of equality and unequality of these covariance matrices. Three covariance structures are used across cells: the same factor analyzer, factor analyzer with unequal specific variances matrices (in the general and parsimonious forms) and factor analyzers with common factor loadings. These structures are used for both modeling covariance structure and for reducing the number of parameters. The maximum likelihood estimates of parameters are computed via the EM algorithm. As an application for these models, we investigate the classification of continuous variables within cells. Based on these models, the classification is done for usual as well as for high dimensional data sets. Finally, for showing the applicability of the proposed models for classification, results from analyzing three real data sets are presented.


62H25 Factor analysis and principal components; correspondence analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI


[1] Airoldi JP, Hoffmann RS (1984) Age variation in volves (Microtus californicus, M. ochrogaster) and its significance for systematic studies. Occasional papers of the Museum of Natural History, University of Kansas, Lawrence KS 111:1-45
[2] Anderson, JA; Pemberton, JD, The grouped continuous model for multivariate ordered categorical variables and covariate adjustment, Biometrics, 41, 875-885, (1985) · Zbl 0615.62065
[3] Baek J, McLachlan GJ (2008) Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Technical Report NI08018-SCH. Preprint Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge
[4] Baek, J.; McLachlan, GJ; Flack, LK, Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data, IEEE Trans Pattern Anal Mach Intell, 32, 1298-1309, (2010)
[5] Barnard, J.; McCulloch, RE; Meng, XL, Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage, Stat Sin, 10, 1281-1311, (2000) · Zbl 0980.62045
[6] Bartholomew DJ, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, 3rd edn. Wiley, New York · Zbl 1266.62040
[7] Belin, TR; Hu, MY; Young, AS; Grusky, O., Performance of a general location model with an ignorable missing-data assumption in a multivariate mental health services study, Stat Med, 18, 3123-3135, (1999)
[8] Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge · Zbl 0332.62039
[9] Browne, RP; McNicholas, PD, Model-based clustering, classification, and discriminant analysis of data with mixed type, J Stat Plann Inference, 142, 2976-2984, (2012) · Zbl 1335.62093
[10] Cai, JH; Song, XY; Lam, KH; Ip, HS, A mixture of generalized latent variable models for mixed mode and heterogeneous data, Comput Stat Data Anal, 55, 2889-2907, (2011) · Zbl 1218.62012
[11] Leon, AR; Carrière, KC, General mixed-data model: extension of general location and grouped continuous models, Can J Stat, 35, 533-548, (2007) · Zbl 1143.62323
[12] de Leon AR, Carrière KC (2013) Analysis of mixed data: methods and applications. Chapman & Hall/CRC, London · Zbl 1318.62006
[13] Leon, AR; Soo, A.; Williamson, T., Classification with discrete and continuous variables via general mixed-data models, J Appl Stat, 38, 1021-1032, (2011)
[14] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B, 39, 1-38, (1977) · Zbl 0364.62022
[15] Fisher, RA, The use of multiple measurements in taxonomic problems, Ann Eugen, 7, 179-188, (1936)
[16] Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3 (2012)
[17] Fonseca, JRS, On the performance of information criteria in latent segment models, World Acad Sci Eng Technol, 63, 330-337, (2010)
[18] Gershenfeld, N., Nonlinear inference and cluster-weighted modeling, Ann N Y Acad Sci, 808, 18-24, (1997)
[19] Ingrassia, S.; Punzo, A.; Vittadini, G.; Minotti, SC, The generalized linear mixed cluster-weighted model, J Classif, 32, 85-113, (2015) · Zbl 1331.62310
[20] Krzanowski, WJ, Mixtures of continuous and categorical variables in discriminant analysis: a hypothesis testing approach, Biometrics, 38, 991-1002, (1982) · Zbl 0539.62069
[21] Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York · Zbl 0665.62004
[22] Little, RJA; Schluchter, MD, Maximum likelihood estimation for mixed continuous and categorical data with missing values, Biometrika, 72, 492-512, (1985) · Zbl 0609.62082
[23] Little RJ, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York · Zbl 1011.62004
[24] Liu, C.; Rubin, DB, Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data, Biometrika, 85, 673-688, (1998) · Zbl 0954.62071
[25] Lopes, HF; West, M., Bayesian model assessment in factor analysis, Stat Sin, 14, 41-67, (2004) · Zbl 1035.62060
[26] Nguyen HT, Coomans D, Leermakers M, Boman J (1997) Multivariate statistical analysis of human exposure to trace elements from coal in Vietnam. in: SPRUCE IV, international conference on statistical aspects of health and the environment, Enschede, The Netherlands (1997)
[27] Olkin, I.; Tate, RF, Multivariate correlation models with mixed discrete and continuous variables, Ann Math Stat, 32, 448-465, (1961) · Zbl 0113.35101
[28] Peng, Y.; Little, RJA; Raghunathan, TE, An extended general location model for causal inferences from data subject to noncompliance and missing values, Biometrics, 60, 598-607, (2004) · Zbl 1274.62046
[29] Poon, WY; Lee, SY, Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients, Psychometrika, 52, 409-430, (1987) · Zbl 0627.62060
[30] Punzo, A.; Ingrassia, S., On the use of the generalized linear exponential cluster-weighted model to asses local linear independence in bivariate data, QdS J Methodol Appl Stat, 15, 131-144, (2013)
[31] Punzo A, Ingrassia S (2015) Clustering bivariate mixed-type data via the cluster-weighted model. Comput Stat. doi:10.1007/s00180-015-0600-z · Zbl 1347.65030
[32] Rencher AC (1998) Multivariate statistical inference and applications. Wiley, New York · Zbl 0932.62065
[33] Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, New York · Zbl 0997.62510
[34] Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[35] Smyth, C.; Coomans, D.; Everingham, Y., Clustering noisy data in a reduced dimension space via multivariate regression trees, Pattern Recognit, 39, 424-431, (2006) · Zbl 1122.68563
[36] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, PD, Clustering and classification via cluster-weighted factor analyzers, Adv Data Anal Classif, 7, 5-40, (2013) · Zbl 1271.62137
[37] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, PD, Cluster-weighted t-factor analyzers for robust model-based clustering and dimension reduction, Stat Methods Appl, 24, 623-649, (2015) · Zbl 1416.62362
[38] Wu, CFJ, On the convergence properties of the EM algorithm, Ann Stat, 11, 95-103, (1983) · Zbl 0517.62035
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.