Multivariate cluster-weighted models based on seemingly unrelated linear regression. (English) Zbl 07512637

Summary: A class of cluster-weighted models for a vector of continuous random variables is proposed. This class provides an extension to cluster-weighted modelling of multivariate and correlated responses that let the researcher free to use a different vector of covariates for each response. The class also includes parsimonious models obtained by imposing suitable constraints on the component-covariance matrices of either the responses or the covariates. Conditions for model identifiability are illustrated and discussed. Maximum likelihood estimation is carried out by means of an expectation-conditional maximisation algorithm. The effectiveness and usefulness of the proposed models are shown through the analysis of simulated and real datasets.


62-XX Statistics
Full Text: DOI


[1] Aitken, A. C., A series formula for the roots of algebraic and transcendental equations, Proc. R. Soc. Edinb., 45, 14-22 (1926) · JFM 51.0096.03
[2] Baird, I. G.; Quastel, N., Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks, Ann. Assoc. Am. Geogr., 101, 337-355 (2011)
[3] Boldea, O.; Magnus, J. R., Maximum likelihood estimation of the multivariate normal mixture model, J. Am. Stat. Assoc., 104, 1539-1549 (2009) · Zbl 1205.62065
[4] Browne, R. P.; McNicholas, P. D., Estimating common principal components in high dimensions, Adv. Data Anal. Classif., 8, 217-226 (2014) · Zbl 1474.62183
[5] Browne, R. P.; McNicholas, P. D., Orthogonal Stiefel manifold optimization for eigen-decomposed covariance parameter estimation in mixture models, Stat. Comput., 24, 203-210 (2014) · Zbl 1325.62008
[6] Cadavez, V. A.P.; Henningsen, A., The use of seemingly unrelated regression (SUR) to predict the carcass composition of lambs, Meat Sci., 92, 548-553 (2012)
[7] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit., 28, 781-793 (1995)
[8] Cellini, R.; Cuccia, T., Museum and monument attendance and tourism flow: a time series analysis approach, Appl. Econ., 45, 3473-3482 (2013)
[9] Chevalier, J. A.; Kashyap, A. K.; Rossi, P. E., Why don’t prices rise during periods of peak demand? Evidence from scanner data, Am. Econ. Rev., 93, 15-37 (2003)
[10] Dang, U. J.; McNicholas, P. D., Families of parsimonious finite mixtures of regression models, (Morlini, I.; Minerva, T.; Vichi, M., Advances in Statistical Models for Data Analysis (2015), Springer: Springer Cham), 73-84
[11] Dang, U. J.; Punzo, A.; McNicholas, P. D.; Ingrassia, S.; Browne, R. P., Multivariate response and parsimony for Gaussian cluster-weighted models, J. Classif., 34, 4-34 (2017) · Zbl 1364.62149
[12] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood for incomplete data via the EM algorithm, J. Roy. Statist. Soc. B, 39, 1-22 (1977) · Zbl 0364.62022
[13] Di Mari, R.; Bakk, Z.; Punzo, A., A random-covariate approach for distal outcome prediction with latent class analysis, Struct. Equ. Model., 27, 351-368 (2020)
[14] Disegna, M.; Osti, L., Tourists’ expenditure behaviour: the influence of satisfaction and the dependence of spending categories, Tour. Econ., 22, 5-30 (2016)
[15] Fletcher, T. D., QuantPsyc: quantitative psychology tools. R package version 1.5 (2012)
[16] Flury, B. N.; Gautschi, W., An algorithm for simultaneous orthogonal transformation of several positive definite symmetric matrices to nearly diagonal form, J. Sci. Statist. Comput., 7, 169-184 (1986) · Zbl 0614.65043
[17] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis and density estimation, J. Am. Stat. Assoc., 97, 611-631 (2002) · Zbl 1073.62545
[18] Frühwirth-Schnatter, S., Finite Mixture and Markov Switching Models (2006), Springer: Springer New York · Zbl 1108.62002
[19] Galimberti, G.; Soffritti, G., Model-based methods to identify multiple cluster structures in a data set, Comput. Stat. Data Anal., 52, 520-536 (2007) · Zbl 1452.62442
[20] Galimberti, G.; Scardovi, E.; Soffritti, G., Using mixtures in seemingly unrelated linear regression models with non-normal errors, Stat. Comput., 26, 1025-1038 (2016) · Zbl 06652993
[21] Galimberti, G.; Nuzzi, L.; Soffritti, G., Covariance matrix estimation of the maximum likelihood estimation in multivariate clusterwise linear regression, Stat. Methods Appl., 30, 235-268 (2021) · Zbl 1474.62283
[22] Galimberti, G.; Soffritti, G., Seemingly unrelated clusterwise linear regression, Adv. Data Anal. Classif., 14, 235-260 (2020) · Zbl 1474.62267
[23] Gallaugher, M. P.B.; Tomarchio, S. D.; McNicholas, P. D.; Punzo, A., Multivariate cluster weighted models using skewed distributions, Adv. Data Anal. Classif. (2021)
[24] Gershenfeld, N., Nonlinear inference and cluster-weighted modeling, Ann. N. Y. Acad. Sci., 808, 18-24 (1997)
[25] Giles, S.; Hampton, P., Regional production relationships during the industrialization of New Zealand, 1935-1948, Reg. Sci., 24, 519-533 (1984)
[26] Grün, B.; Leisch, F., FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters, J. Stat. Softw., 28, 4, 1-35 (2008)
[27] Heidari, S.; Keshavarz, S.; Mirahmadizadeh, A., Application of seemingly unrelated regression (SUR) in determination of risk factors of fatigue and general health among the employees of petrochemical companies, J. Health Sci. Surveillance Sys., 5, 1-8 (2017)
[28] Hennig, C., Identifiability of models for clusterwise linear regression, J. Classif., 17, 273-296 (2000) · Zbl 1017.62058
[29] Henningsen, A.; Hamann, J. D., systemfit: a package for estimating systems of simultaneous equations in R, J. Stat. Softw., 23, 4, 1-40 (2007)
[30] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[31] Ingrassia, S.; Minotti, S. C.; Vittadini, G., Local statistical modeling via a cluster-weighted approach with elliptical distributions, J. Classif., 29, 363-401 (2012) · Zbl 1360.62335
[32] Ingrassia, S.; Minotti, S. C.; Punzo, A., Model-based clustering via linear cluster-weighted models, Comput. Stat. Data Anal., 71, 159-182 (2014) · Zbl 1471.62095
[33] Ingrassia, S.; Punzo, A., Cluster validation for mixtures of regressions via the total sum of squares decomposition, J. Classif., 37, 526-547 (2020) · Zbl 07223614
[34] Ingrassia, S.; Punzo, A.; Vittadini, G.; Minotti, S. C., The generalized linear mixed cluster-weighted model, J. Classif., 32, 85-113 (2015) · Zbl 1331.62310
[35] Ingrassia, S.; Rocci, R., Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints, Comput. Stat. Data Anal., 55, 1715-1725 (2011) · Zbl 1328.65030
[36] Jones, P. N.; McLachlan, G. J., Fitting finite mixture models in a regression context, Aust. J. Stat., 34, 233-240 (1992)
[37] Keshavarzi, S.; Ayatollahi, S. M.T.; Zare, N.; Pakfetrat, M., Application of seemingly unrelated regression in medical data with intermittently observed time-dependent covariates, Comput. Math. Methods Med., 821643 (2012) · Zbl 1303.92013
[38] Keshavarzi, S.; Ayatollahi, S. M.T.; Zare, N.; Sharif, F., Quality of life of childbearing age women and its associated factors: an application of seemingly unrelated regression (SUR) models, Qual. Life Res., 22, 1255-1263 (2013)
[39] Lin, T. I., Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition, Comput. Stat. Data Anal., 71, 183-195 (2014) · Zbl 1471.62120
[40] Magnus, J. R.; Neudecker, H., Matrix Differential Calculus with Applications in Statistics and Econometrics (1988), John Wiley & Sons: John Wiley & Sons New York · Zbl 0651.15001
[41] Mardia, K. V., Measures of multivariate skewness and kurtosis with applications, Biometrika, 57, 519-530 (1970) · Zbl 0214.46302
[42] Mardia, K. V., Applications of some measures of multivariate skewness and kurtosis for testing normality and robustness studies, Sankhya, 36, 115-128 (1974) · Zbl 0345.62031
[43] McLachlan, G. J.; Peel, D., Finite Mixture Models (2000), Wiley: Wiley New York · Zbl 0963.62061
[44] McNicholas, P. D., Model-based classification using latent Gaussian mixture models, J. Stat. Plan. Inference, 140, 1175-1181 (2010) · Zbl 1181.62095
[45] Meng, X.; Rubin, D. B., Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 267-278 (1993) · Zbl 0778.62022
[46] Park, T., Equivalence of maximum likelihood estimation and iterative two-stage estimation for seemingly unrelated regression models, Comm. Statist.: Theory Meth., 22, 2285-2296 (1993) · Zbl 0800.62381
[47] Punzo, A., Flexible mixture modeling with the polynomial Gaussian cluster-weighted model, Stat. Model., 14, 257-291 (2014) · Zbl 07257904
[48] Punzo, A.; Ingrassia, S., On the use of the generalized linear exponential cluster-weighted model to assess local linear independence in bivariate data. QdS, J. Methodol. Appl. Stat., 15, 131-144 (2013)
[49] Punzo, A.; Ingrassia, S., Clustering bivariate mixed-type data via the cluster-weighted model, Comput. Stat., 31, 989-1013 (2015) · Zbl 1347.65030
[50] Punzo, A.; McNicholas, P. D., Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model, J. Classif., 34, 249-293 (2017) · Zbl 1373.62316
[51] R: a Language and Environment for Statistical Computing (2020), R Foundation for Statistical Computing: R Foundation for Statistical Computing Vienna, Austria
[52] Rocci, R.; Gattone, S. A.; Di Mari, R., A data driven equivariant approach to constrained Gaussian mixture modeling, Adv. Data Anal. Classif., 12, 235-260 (2018) · Zbl 1414.62269
[53] Rossi, P. E., bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-5 (2012)
[54] Sahin, Ö.; Czado, C., Vine copula mixture models and clustering for non-Gaussian data, Econom. Stat. (2021)
[55] Schwarz, G., Estimating the dimension of a model, Ann. Stat., 6, 461-464 (1978) · Zbl 0379.62005
[56] Scrucca, L.; Fop, M.; Murphy, T. B.; Raftery, A. E., mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, R J., 8/1:205-223 (2017)
[57] Soffritti, G., Estimating the covariance matrix of the maximum likelihood estimator under linear cluster-weighted models, J. Classif., 38, 594-625 (2021) · Zbl 07473952
[58] Soffritti, G.; Galimberti, G., Multivariate linear regression with non-normal errors: a solution based on mixture models, Stat. Comput., 21, 523-536 (2011) · Zbl 1221.62106
[59] Srivastava, V. K.; Giles, D. E.A., Seemingly Unrelated Regression Equations Models (1987), Marcel Dekker: Marcel Dekker New York · Zbl 0638.62108
[60] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, P. D., Clustering and classification via cluster-weighted factor analyzers, Adv. Data Anal. Classif., 7, 5-40 (2013) · Zbl 1271.62137
[61] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, P. D., Cluster-weighted t-factor analyzers for robust model-based clustering and dimension reduction, Stat. Methods Appl., 24, 623-649 (2015) · Zbl 1416.62362
[62] Wang, W. L.; Lin, T. I., Maximum likelihood inference for the multivariate t mixture model, J. Multivar. Anal., 149, 54-64 (2016) · Zbl 1341.62138
[63] White, E. N.; Hewings, G. J.D., Space-time employment modelling: some results using seemingly unrelated regression estimators, J. Reg. Sci., 22, 283-302 (1982)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.