×

zbMATH — the first resource for mathematics

Sure independence screening for ultrahigh dimensional feature space. With discussion and authors’ reply. (English) Zbl 1411.62187
Summary: Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality \(p\), accuracy of estimation and computational cost are two top concerns. Recently, E. Candès and T. Tao [Ann. Stat. 35, No. 6, 2313–2404 (2007; Zbl 1139.62019)] have proposed the Dantzig selector using \(L_1\)-regularization and showed that it achieves the ideal risk up to a logarithmic factor \(\log(p)\). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor \(\log(p)\) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

MSC:
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62F10 Point estimation
62-02 Research exposition (monographs, survey articles) pertaining to statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Antoniadis, A. and Fan, J. ( 2001) Regularization of wavelets approximations (with discussion). J. Am. Statist. Ass., 96, 939– 967. · Zbl 1072.62561
[2] Bai, Z. D. ( 1999) Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sin., 9, 611– 677. · Zbl 0949.60077
[3] Bai, Z. D. and Yin, Y. Q. ( 1993) Limit of smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Probab., 21, 1275– 1294. · Zbl 0779.60026
[4] Baron, D., Wakin, M. B., Duarte, M. F., Sarvotham, S. and Baraniuk, R. G. ( 2005) Distributed compressed sensing. Manuscript.
[5] Barron, A., Cohen, A., Dahmen, W. and DeVore, R. ( 2008) Approximation and learning by greedy algorithms. Ann. Statist., 36, 64– 94. · Zbl 1138.62019
[6] Bickel, P. J. and Levina, E. ( 2004) Some theory for Fisher’s linear discriminant function, ‘‘naive Bayes’’, and some alternatives when there are many more variables than observations. Bernoulli, 10, 989– 1010. · Zbl 1064.62073
[7] Bickel, P. J. and Levina, E. ( 2008) Regularized estimation of large covariance matrices. Ann. Statist., 36, 199– 227. · Zbl 1132.62040
[8] Bickel, P. J., Ritov, Y. and Tsybakov, A. ( 2008) Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 36, in the press. · Zbl 1173.62022
[9] Breiman, L. ( 1995) Better subset regression using the nonnegative garrote. Technometrics, 37, 373– 384. · Zbl 0862.62059
[10] Breiman, L. ( 1996) Heuristics of instability and stabilization in model selection. Ann. Statist., 24, 2350– 2383. · Zbl 0867.62055
[11] Candes, E. and Tao, T. ( 2007) The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Statist., 35, 2313– 2404. · Zbl 1139.62019
[12] Chikuse, Y. ( 2003) Statistics on special manifolds. Lect. Notes Statist, 174.
[13] Donoho, D. L. ( 2000) High‐dimensional data analysis: the curses and blessings of dimensionality. American Mathematical Society Conf. Math Challenges of the 21st Century.
[14] Donoho, D. L. and Elad, M. ( 2003) Maximal sparsity representation via l_{1} minimization. Proc. Natn. Acad. Sci. USA, 100, 2197– 2202. · Zbl 1064.94011
[15] Donoho, D. L. and Huo, X. ( 2001) Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory, 47, 2845– 2862. · Zbl 1019.94503
[16] Donoho, D. L. and Johnstone, I. M. ( 1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425– 455. · Zbl 0815.62019
[17] Eaton, M. L. ( 1989) Group Invariance Applications in Statistics. Hayward: Institute of Mathematical Statistics. · Zbl 0749.62005
[18] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. ( 2004) Least angle regression (with discussion). Ann. Statist., 32, 407– 499. · Zbl 1091.62054
[19] Fan, J. ( 1997) Comments on ‘‘Wavelets in statistics: a review,’’ by A. Antoniadis. J. Ital. Statist. Ass., 6, 131– 138.
[20] Fan, J. and Fan, Y. ( 2008) High dimensional classification using features annealed independence rules. Ann. Statist., to be published. · Zbl 1360.62327
[21] Fan, J. and Li, R. ( 2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Ass., 96, 1348– 1360. · Zbl 1073.62547
[22] Fan, J. and Li, R. ( 2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann. Statist., 30, 74– 99. · Zbl 1012.62106
[23] Fan, J. and Li, R. ( 2006) Statistical challenges with high dimensionality: feature selection in knowledge discovery. In Proc. Int. Congr. Mathematicians (eds M. Sanz‐Sole, J. Soria, J. L. Varona and J. Verdera), vol. III, pp. 595– 622. Freiburg: European Mathematical Society.
[24] Fan, J. and Peng, H. ( 2004) Nonconcave penalized likelihood with diverging number of parameters. Ann. Statist., 32, 928– 961. · Zbl 1092.62031
[25] Fan, J. and Ren, Y. ( 2006) Statistical analysis of DNA microarray data. Clin. Cancer Res., 12, 4469– 4473.
[26] Frank, I. E. and Friedman, J. H. ( 1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics, 35, 109– 148. · Zbl 0775.62288
[27] Freund, Y. and Schapire, R. E. ( 1997) A decision‐theoretic generalization of on‐line learning and an application to boosting. J. Comput. Syst. Sci., 55, 119– 139. · Zbl 0880.68103
[28] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. ( 2007) Pathwise coordinate optimization. Ann. Appl. Statist., 1, 302– 332. · Zbl 1378.90064
[29] Geman, S. ( 1980) A limit theorem for the norm of random matrices. Ann. Probab., 8, 252– 261. · Zbl 0428.60039
[30] George, E. I. and McCulloch, R. E. ( 1997) Approaches for Bayesian variable selection. Statist. Sin., 7, 339– 373. · Zbl 0884.62031
[31] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. ( 1999) Molecular classification of cancer: class discovery and class prediction by expression monitoring. Science, 286, 531– 537.
[32] Greenshtein, E. ( 2006) Best subset selection, persistence in high dimensional statistical learning and optimization under l_{1} constraint. Ann. Statist., 34, 2367– 2386. · Zbl 1106.62022
[33] Greenshtein, E. and Ritov, Y. ( 2004) Persistence in high‐dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971– 988. · Zbl 1055.62078
[34] Grenander, U. and Szegö, G. ( 1984) Toeplitz Forms and Their Applications. New York: Chelsea.
[35] Gribonval, R., Mailhe, B., Rauhut, H., Schnass, K. and Vandergheynst, P. ( 2007) Average case analysis of multichannel thresholding. In Proc. Int. Conf. Acoustic and Speech Signal Processing. New York: Institute of Electrical and Electronics Engineers.
[36] Hall, P., Marron, J. S. and Neeman, A. ( 2005) Geometric representation of high dimension, low sample size data. J. R. Statist. Soc. B, 67, 427– 444. · Zbl 1069.62097
[37] Huang, J., Horowitz, J. and Ma, S. ( 2008) Asymptotic properties of bridge estimators in sparse high‐dimensional regression models. Ann. Statist., 36, 587– 613. · Zbl 1133.62048
[38] Hunter, D. and Li, R. ( 2005) Variable selection using MM algorithms. Ann. Statist., 33, 1617– 1642. · Zbl 1078.62028
[39] Johnstone, I. M. ( 2001) On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., 29, 295– 327. · Zbl 1016.62078
[40] Knight, K. and Fu, W. ( 2000) Asymptotics for Lasso‐type estimators. Ann. Statist., 28, 1356– 1378. · Zbl 1105.62357
[41] Lam, C. and Fan, J. ( 2007) Sparsistency and rates of convergence in large covariance matrices estimation. Manuscript. · Zbl 1191.62101
[42] Ledoux, M. ( 2001) The Concentration of Measure Phenomenon. Cambridge: American Mathematical Society. · Zbl 0995.60002
[43] Ledoux, M. ( 2005) Deviation inequalities on largest eigenvalues. Manuscript. · Zbl 1130.15012
[44] Meier, L., Van De Geer, S. and Bühlmann, P. ( 2008) The group lasso for logistic regression. J. R. Statist. Soc. B, 70, 53‐ 71. · Zbl 1400.62276
[45] Meinshausen, N. ( 2007) Relaxed Lasso. Computnl Statist. Data Anal., 52, 374– 393. · Zbl 1452.62522
[46] Meinshausen, N. and Bühlmann, P. ( 2006) High dimensional graphs and variable selection with the Lasso. Ann. Statist., 34, 1436– 1462. · Zbl 1113.62082
[47] Meinshausen, N., Rocha, G. and Yu, B. ( 2007) Discussion of ‘‘The Dantzig selector: statistical estimation when p is much larger than n’’. Ann. Statist., 35, 2373– 2384.
[48] Nikolova, M. ( 2000) Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math., 61, 633– 658. · Zbl 0991.94015
[49] Paul, D., Bair, E., Hastie, T. and Tibshirani, R. ( 2008) ‘‘Pre‐conditioning’’ for feature selection and regression in high‐dimensional problems. Ann. Statist., to be published. · Zbl 1142.62022
[50] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. ( 2007) Sparse additive models. Manuscript.
[51] Silverstein, J. W. ( 1985) The smallest eigenvalue of a large dimensional Wishart matrix. Ann. Probab., 13, 1364– 1368. · Zbl 0591.60025
[52] Storey, J. D. and Tibshirani, R. ( 2003) Statistical significance for genome‐wide studies. Proc. Natn. Acad. Sci. USA, 100, 9440– 9445. · Zbl 1130.62385
[53] Tibshirani, R. ( 1996) Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58, 267– 288. · Zbl 0850.62538
[54] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. ( 2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natn. Acad. Sci. USA, 99, 6567– 6572.
[55] Van Der Vaart, A. W. and Wellner, J. A. ( 1996) Weak Convergence and Empirical Processes. New York: Springer. · Zbl 0862.60002
[56] Zhang, C.‐H. ( 2007) Penalized linear unbiased selection. Technical Report 2007-003. Department of Statistics, Rutgers University, Piscataway.
[57] Zhang, C.‐H. and Huang, J. ( 2008) The sparsity and bias of the LASSO selection in high‐dimensional linear regression. Ann. Statist., 36, 1567– 1594. · Zbl 1142.62044
[58] Zhao, P. and Yu, B. ( 2006) On model selection consistency of Lasso. J. Mach. Learn. Res., 7, 2541– 2567. · Zbl 1222.62008
[59] Zou, H. ( 2006) The adaptive Lasso and its oracle properties. J. Am. Statist. Ass., 101, 1418– 1429. · Zbl 1171.62326
[60] Zou, H. and Li, R. ( 2008) One‐step sparse estimates in nonconcave penalized likelihood models. · Zbl 1282.62112
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.