zbMATH — the first resource for mathematics

Dimension reduction based on constrained canonical correlation and variable filtering. (English) Zbl 1142.62045
Summary: The “curse of dimensionality” has remained a challenge for high-dimensional data analysis in statistics. The sliced inverse regression (SIR) and canonical correlation (CANCOR) methods aim to reduce the dimensionality of data by replacing the explanatory variables with a small number of composite directions without losing much information. However, the estimated composite directions generally involve all of the variables, making their interpretation difficult. To simplify the direction estimates, L. Ni, R. D. Cook and C.-L. Tsai [Biometrika 92, No. 1, 242–247 (2005; Zbl 1068.62080)] proposed the shrinkage sliced inverse regression (SSIR) based on SIR. We propose the constrained canonical correlation \((C^{3})\) method based on CANCOR, followed by a simple variable filtering method. As a result, each composite direction consists of a subset of the variables for interpretability as well as predictive power. The proposed method aims to identify simple structures without sacrificing the desirable properties of the unconstrained CANCOR estimates. The simulation studies demonstrate the performance advantage of the proposed \(C^{3}\) method over the SSIR method. We also use the proposed method in two examples for illustration.

62J07 Ridge regression; shrinkage estimators (Lasso)
62H20 Measures of association (correlation, canonical correlation, etc.)
62G08 Nonparametric regression and quantile regression
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI arXiv
[1] Chen, C.-H. and Li, K.-C. (1998). Can SIR be as popular as multiple linear regression? Statist. Sinica 8 289-316. · Zbl 0897.62069
[2] Cook, R. D. (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In 1994 Proceedings of the Section on Physical Engineering Sciences 18-25. Amer. Statist. Assoc., Alexandria, VA.
[3] Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Ann. Statist. 32 1062-1092. · Zbl 1092.62046 · doi:10.1214/009053604000000292
[4] Cook, R. D. and Critchely, F. (2000). Identifying outliers and regression mixtures graphically. J. Amer. Statist. Assoc. 95 781-794. · Zbl 0999.62056 · doi:10.2307/2669462
[5] Cook, R. D. and Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction” by K. C. Li. J. Amer. Statist. Assoc. 86 328-332. JSTOR: · Zbl 0742.62044 · doi:10.2307/2290563 · links.jstor.org
[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547 · doi:10.1198/016214501753382273 · links.jstor.org
[7] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031 · doi:10.1214/009053604000000256
[8] Fung, W. K., He, X., Liu, L. and Shi, P. (2002). Dimension reduction based on canonical correlation. Statist. Sinica 12 1093-1113. · Zbl 1004.62058
[9] Li, B., Zha, H. and Chiaromonte, F. (2005). Contour regression: A general approach to dimension reduction. Ann. Statist. 33 1580-1616. · Zbl 1078.62033 · doi:10.1214/009053605000000192
[10] Li, L. (2007). Sparse sufficient dimension reduction. Biometrika 94 603-613. · Zbl 1135.62062 · doi:10.1093/biomet/asm044
[11] Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316-327. JSTOR: · Zbl 0742.62044 · doi:10.2307/2290563 · links.jstor.org
[12] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025-1039. JSTOR: · Zbl 0765.62003 · doi:10.2307/2290640 · links.jstor.org
[13] Li, K.-C. (2000) High dimensional data analysis via the SIR/PHD approach. Available at http://www.stat.ucla.edu/ kcli/sir-PHD.pdf.
[14] Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009-1052. · Zbl 0753.62041 · doi:10.1214/aos/1176347254
[15] Muirhead, R. J. and Waternaux, C. M. (1980). Asymptotic distributions in canonical correlation analysis and other multivariate procedures for nonnormal populations. Biometrika 67 31-43. JSTOR: · Zbl 0448.62037 · doi:10.1093/biomet/67.1.31 · links.jstor.org
[16] Naik, P. A. and Tsai, C.-L. (2001). Single-index model selections. Biometrika 88 821-832. JSTOR: · Zbl 0988.62042 · doi:10.1093/biomet/88.3.821 · links.jstor.org
[17] Ni, L., Cook, R. D. and Tsai, C.-L. (2005). A note on shrinkage sliced inverse regression. Biometrika 92 242-247. · Zbl 1068.62080 · doi:10.1093/biomet/92.1.242
[18] Shi, P. and Tsai, C.-L. (2002). Regression model selection-a residual likelihood approach. J. Roy. Statist. Soc. Ser. B 64 237-252. JSTOR: · Zbl 1059.62074 · doi:10.1111/1467-9868.00335 · links.jstor.org
[19] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538 · links.jstor.org
[20] Xia, Y., Tong, H., Li, W. K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. Roy. Statist. Soc. Ser. B 64 363-410. JSTOR: · Zbl 1091.62028 · doi:10.1111/1467-9868.03411 · links.jstor.org
[21] Zhou, J. (2008). Robust dimension reduction based on canonical correlation. · Zbl 1151.62055 · doi:10.1016/j.jmva.2008.04.003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.