×

A multivariate linear regression analysis using finite mixtures of \(t\) distributions. (English) Zbl 1471.62070

Summary: Recently, finite mixture models have been used to model the distribution of the error terms in multivariate linear regression analysis. In particular, Gaussian mixture models have been employed. A novel approach that assumes that the error terms follow a finite mixture of \(t\) distributions is introduced. This assumption allows for an extension of multivariate linear regression models, making these models more versatile and robust against the presence of outliers in the error term distribution. The issues of model identifiability and maximum likelihood estimation are addressed. In particular, identifiability conditions are provided and an Expectation-Maximisation algorithm for estimating the model parameters is developed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo experiments and compared to the estimators from the Gaussian mixture models. Results from the analysis of two real datasets are presented.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Andrews, J. L.; McNicholas, P. D., Extending mixtures of multivariate \(t\)-factor analyzers, Statistics and Computing, 21, 361-373, (2011) · Zbl 1255.62175
[2] Azzalini, A., 2011. R package sn: the skew-normal and skew-\(t\) distributions, version 0.4-17. URL: http://azzalini.stat.unipd.it/SN.
[3] Banfield, J. D.; Raftery, A. E., Model-based gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[4] Bartolucci, F.; Scaccia, L., The use of mixtures for dealing with non-normal regression errors, Computational Statistics and Data Analysis, 48, 821-834, (2005) · Zbl 1429.62284
[5] Berlinet, A. F.; Roland, Ch., Acceleration of the EM algorithm: \(P\)-EM versus epsilon algorithm, Computational Statistics and Data Analysis, 56, 4122-4137, (2012) · Zbl 1254.65018
[6] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated classification likelihood, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719-725, (2000)
[7] Biernacki, C.; Celeux, G.; Govaert, G., Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models, Computational Statistics and Data Analysis, 41, 561-575, (2003) · Zbl 1429.62235
[8] Biernacki, C.; Celeux, G.; Govaert, G.; Langrognet, F., Model-based cluster and discriminant analysis with the MIXMOD software, Computational Statistics and Data Analysis, 51, 587-600, (2006) · Zbl 1157.62431
[9] Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognition, 28, 781-793, (1995)
[10] Cook, R. D.; Weisberg, S., An introduction to regression graphics, (1994), Wiley New York · Zbl 0925.62287
[11] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood for incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B, 39, 1-22, (1977) · Zbl 0364.62022
[12] Efron, B.; Tibshirani, R. J., An introduction to the bootstrap, (1993), Chapman & Hall London · Zbl 0835.62038
[13] Fernandez, C.; Steel, M. F.J., Multivariate student-\(t\) regression models: pitfalls and inference, Biometrika, 86, 153-167, (1999) · Zbl 0917.62020
[14] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association, 97, 611-631, (2002) · Zbl 1073.62545
[15] Fraley, C., Raftery, A.E., 2006. MCLUST version 3 for R: normal mixture modeling and model-based clustering. Technical Report No. 504, Department of Statistics, University of Washington (revised 2009).
[16] Frank, A., Asuncion, A., 2010. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml.
[17] Greselin, F.; Ingrassia, S., Constrained monotone EM algorithms for mixtures of multivariate \(t\) distributions, Statistics and Computing, 20, 9-22, (2010)
[18] Grün, B.; Leisch, F., Flexmix version 2: finite mixtures with concomitant variables and varying and constant parameters, Journal of Statistical Software, 28, 1-35, (2008)
[19] Holzmann, H.; Munk, A.; Gneiting, T., Identifiability of finite mixtures of elliptical distributions, Scandinavian Journal of Statistics, 33, 753-763, (2006) · Zbl 1164.62354
[20] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 193-218, (1985)
[21] Joarder, A. H.; Ali, M. M., On the characteristic function of the multivariate \(t\)-distribution, Pakistan Journal of Statistics, 12, 55-62, (1996) · Zbl 0898.60033
[22] Karlis, D.; Santourian, A., Model-based clustering with non-elliptically contoured distributions, Statistics and Computing, 19, 73-83, (2009)
[23] Kotz, S.; Nadarajah, S., Multivariate \(t\) distributions and their applications, (2004), Cambridge University Press New York · Zbl 1100.62059
[24] Lange, K. L.; Little, R. J.A.; Taylor, J. M.G., Robust statistical modeling using the \(t\) distribution, Journal of the American Statistical Association, 84, 881-896, (1989)
[25] Lütkepohl, H., Handbook of matrices, (1996), John Wiley & Sons Chichester · Zbl 0856.15001
[26] MacQueen, J., Some methods for classification and analysis of multivariate observations, (Le Cam, L. M.; Neyman, J., Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, (1967), University of California Press Berkeley), 281-297 · Zbl 0214.46201
[27] McLachlan, G. J.; Bean, R. W.; Ben-Tovim Jones, L., Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\) distribution, Computational Statistics & Data Analysis, 51, 5327-5338, (2007) · Zbl 1445.62053
[28] McLachlan, G. J.; Krishnan, T., The EM algorithm and extensions, (2008), Wiley Chichester · Zbl 1165.62019
[29] McLachlan, G. J.; Peel, D., Finite mixture models, (2000), Wiley Chichester · Zbl 0963.62061
[30] McNicholas, P. D.; Murphy, T. B.; McDaid, A. F.; Frost, D., Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models, Computational Statistics and Data Analysis, 54, 711-723, (2010) · Zbl 1464.62131
[31] Melnykov, V.; Melnykov, I., Initializing the EM algorithm in Gaussian mixture models with an unknown number of components, Computational Statistics and Data Analysis, 56, 1381-1395, (2012) · Zbl 1246.65025
[32] O’Hagan, A.; Murphy, T. B.; Gormley, I. C., Computational aspects of Fitting mixture models via the expectation-maximization algorithm, Computational Statistics and Data Analysis, 56, 3843-3864, (2012) · Zbl 1255.62180
[33] Paolella, M. S., Intermediate probability. A computational approach, (2007), John Wiley & Sons Chichester · Zbl 1149.60002
[34] Peel, D.; McLachlan, G. J., Robust mixture modelling using the \(t\) distribution, Statistics and Computing, 10, 339-348, (2000)
[35] R Development Core Team, R: a language and environment for statistical computing, (R Foundation for Statistical Computing, (2012), Vienna Austria), URL: http://www.R-project.org
[36] Schott, J. R., Matrix analysis for statistics, (2005), John Wiley & Sons New York · Zbl 1076.15002
[37] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 461-464, (1978) · Zbl 0379.62005
[38] Seo, B.; Kim, D., Root selection in normal mixture models, Computational Statistics and Data Analysis, 56, 2454-2470, (2012) · Zbl 1252.62013
[39] Soffritti, G.; Galimberti, G., Multivariate linear regression with non-normal errors: a solution based on mixture models, Statistics and Computing, 21, 523-536, (2011) · Zbl 1221.62106
[40] Srivastava, M. S., Methods of multivariate statistics, (2002), John Wiley & Sons New York · Zbl 1006.62048
[41] Sutradhar, B. C.; Ali, M. M., Estimation of the parameters of a regression model with a multivariate \(t\) error variable, Communications in Statistics—Theory and Methods, 15, 429-450, (1986) · Zbl 0608.62061
[42] Teicher, H., Identifiability of mixtures of product measures, The Annals of Mathematical Statistics, 38, 1300-1302, (1967) · Zbl 0153.47904
[43] Yakowitz, S. J.; Spragins, J. D., On the identifiability of finite mixtures, The Annals of Mathematical Statistics, 39, 209-214, (1968) · Zbl 0155.25703
[44] Zellner, A., Bayesian and non-Bayesian analysis of the regression model with multivariate student-\(t\) error terms, Journal of the American Statistical Association, 71, 400-405, (1976) · Zbl 0348.62026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.