×

Inverse regression approach to robust nonlinear high-to-low dimensional mapping. (English) Zbl 1408.62119

Summary: The goal of this paper is to address the issue of nonlinear regression with outliers, possibly in high dimension, without specifying the form of the link function and under a parametric approach. Nonlinearity is handled via an underlying mixture of affine regressions. Each regression is encoded in a joint multivariate Student distribution on the responses and covariates. This joint modeling allows the use of an inverse regression strategy to handle the high dimensionality of the data, while the heavy tail of the Student distribution limits the contamination by outlying data. The possibility to add a number of latent variables similar to factors to the model further reduces its sensitivity to noise or model misspecification. The mixture model setting has the advantage of providing a natural inference procedure using an EM algorithm. The tractability and flexibility of the algorithm are illustrated in simulations and real high-dimensional data with good performance that compares favorably with other existing methods.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G35 Nonparametric robustness
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62J02 General nonlinear regression
PDFBibTeX XMLCite
Full Text: DOI HAL

References:

[1] Adragni, K. P.; Cook, R. D., Sufficient dimension reduction and prediction in regression, Phil. Trans. R. Soc. A, 367, 4385-4405 (2009) · Zbl 1185.62109
[2] Archambeau, C.; Verleysen, M., Robust Bayesian clustering, Neural Netw., 20, 129-138 (2007) · Zbl 1158.68440
[3] Bæk, J.; McLachlan, G. J.; Flack, L. K., Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data, IEEE Trans. Pattern Anal. Mach. Intell., 32, 1298-1309 (2010)
[4] Bernard-Michel, C.; Douté, S.; Fauvel, M.; Gardes, L.; Girard, S., Retrieval of Mars surface physical properties from OMEGA hyperspectral images using regularized sliced inverse regression, J. Geophys. Res. Planets, 114, E6 (2009)
[5] Bishop, C. M.; Svensen, M., Robust Bayesian mixture modelling, Neurocomputing, 64, 235-252 (2005)
[6] Bouveyron, C.; Girard, S.; Schmid, C., High dimensional data clustering, Comput. Statist. Data Anal., 52, 502-519 (2007) · Zbl 1452.62433
[7] Breiman, L.; forests, Random, Mach. Learn., 45, 5-32 (2001)
[8] F. Chamroukhi, Non-Normal Mixtures of Experts, ArXiv e-prints.; F. Chamroukhi, Non-Normal Mixtures of Experts, ArXiv e-prints.
[9] Cook, D., Fisher Lecture: Dimension reduction in regression, Statist. Sci., 22, 1-26 (2007) · Zbl 1246.62148
[10] de Veaux, R. D., Mixtures of linear regressions, Comput. Statist. Data Anal., 8, 227-245 (1989) · Zbl 0726.62109
[11] Deleforge, A.; Forbes, F.; Horaud, R., High-dimensional regression with Gaussian mixtures and partially-latent response variables, Stat. Comput., 25, 893-911 (2015) · Zbl 1332.62192
[12] Devijver, E., Finite mixture regression: A sparse variable selection by model selection for clustering, Electron. J. Stat., 9, 2642-2674 (2015) · Zbl 1329.62279
[13] Ding, P., Bayesian robust inference of sample selection using selection-\(t\) models, J. Multivariate Anal., 45, 451-464 (2014) · Zbl 1359.62079
[14] Ding, P., On the conditional distribution of the multivariate \(t\) distribution, Amer. Statist., 70, 293-295 (2016) · Zbl 07665887
[15] Forbes, F.; Wraith, D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweights: Application to robust clustering, Statist. Comput., 24, 971-984 (2014) · Zbl 1332.62204
[16] Friedman, J., Multivariate adaptive regression splines (with discussion), Ann. Statist., 19, 1-141 (1991) · Zbl 0765.62064
[17] Frühwirth-Schnatter, S., Finite Mixture and Markov Switching Models (2006), Springer: Springer New York · Zbl 1108.62002
[18] García-Escudero, L. A.; Gordaliza, A.; Greselin, F.; Ingrassia, S.; Mayo-Iscar, A., Robust estimation of mixtures of regressions with random covariates, via trimming and constraints, Statist. Comput., 27, 377-402 (2017) · Zbl 1505.62152
[19] Gershenfeld, N., Nonlinear inference and cluster-weighted modeling, Ann. New York Acad. Sci., 808, 18-24 (1997)
[20] Goldfeld, S. M.; Quandt, R. E., A Markov model for switching regressions, J. Econometrics, 1, 3-15 (1973) · Zbl 0294.62087
[21] Hennig, C., Identifiability of models for clusterwise linear regression, J. Classification, 17, 273-296 (2000) · Zbl 1017.62058
[22] Ingrassia, S.; Minotti, S. C.; Vittadini, G., Local statistical modeling via a cluster-weighted approach with elliptical distributions, J. Classification, 29, 363-401 (2012) · Zbl 1360.62335
[23] Jiang, Z.; Ding, P., Robust modeling using non-elliptically contoured multivariate distributions, J. Statist. Plann. Inference, 177, 50-63 (2016) · Zbl 1353.62052
[24] Karatzoglou, A.; Meyer, D.; Hornik, K., Support vector machines in R, J. Stat. Softw., 15, 1-28 (2006)
[25] Kotz, S.; Nadarajah, S., Multivariate \(t\) Distributions and their Applications (2004), Cambridge University Press · Zbl 1100.62059
[26] Lee, S.; McLachlan, G., Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results, Statist. Comput., 24, 181-202 (2014) · Zbl 1325.62107
[27] Li, K., Sliced inverse regression for dimension reduction, J. Amer. Statist. Assoc., 86, 316-327 (1991) · Zbl 0742.62044
[28] Lin, T., Robust mixture modelling using multivariate skew-\(t\) distribution, Statist. Comput., 20, 343-356 (2010)
[29] Liu, C., Robit regression: A simple robust alternative to logistic and probit regression, (Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives (2004)), 227-238 · Zbl 05274820
[30] Marchenko, Y. V.; Genton, M. G., A Heckman selection \(t\) model, J. Amer. Statist. Assoc., 107, 304-317 (2012) · Zbl 1328.62429
[31] Meng, X.-L.; Van Dyk, D., The EM algorithm: An old folk-song sung to a fast new tune, J. R. Stat. Soc. Ser. B Stat. Methodol., 59, 511-567 (1997) · Zbl 1090.62518
[32] O’Hagan, A.; Murphy, T. B.; Gormley, I. C.; McNicholas, P.; Karlis, D., Clustering with the multivariate Normal Inverse Gaussian distribution, Comput. Statist. Data Anal., 93, 18-30 (2016) · Zbl 1468.62151
[33] Peel, D.; McLachlan, G., Robust mixture modeling using the \(t\) distribution, Statist. Comput., 10, 339-348 (2000)
[34] Pinheiro, J. C.; Liu, C.; Wu, Y. N., Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate \(t\) distribution, J. Comput. Graph. Statist., 10, 249-276 (2001)
[35] Rosipal, R.; Krämer, N., Overview and recent advances in partial least squares, (Saunders, C.; Grobelnik, M.; Gunn, S.; Shawe-Taylor, J., Subspace, Latent Structure and Feature Selection (2006), Springer: Springer New York), 34-51
[36] Städler, N.; Bühlmann, P.; van de Geer, S., 1-penalization for mixture regression models, TEST, 19, 209-256 (2010) · Zbl 1203.62128
[37] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, P., Clustering and classification via cluster-weighted factor analyzers, Adv. Data Anal. Classif., 7, 5-40 (2013) · Zbl 1271.62137
[38] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, P., Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction, Stat. Methods Appl., 24, 623-649 (2015) · Zbl 1416.62362
[39] Tipping, M. E., Sparse Bayesian learning and the relevance vector machine, J. Mach. Learn. Res., 1, 211-244 (2001) · Zbl 0997.68109
[40] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley New York · Zbl 0935.62007
[41] Wraith, D.; Forbes, F., Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering, Comput. Statist. Data Anal., 90, 61-73 (2015) · Zbl 1468.62210
[42] Wu, H., Kernel sliced inverse regression with applications to classification, J. Comput. Graph. Statist., 17, 590-610 (2008)
[43] Xu, L.; Jordan, M.; Hinton, G., An alternative model for mixtures of experts, Adv. Neural Inf. Process. Syst., 633-640 (1995)
[44] Yao, W.; Wei, Y.; Yu, C., Robust mixture regression using the \(t\)-distribution, Comput. Statist. Data Anal., 71, 116-127 (2014) · Zbl 1471.62227
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.