×

Regularized estimation in sparse high-dimensional multivariate regression, with application to a DNA methylation study. (English) Zbl 1371.92014

Summary: In this article, we consider variable selection for correlated high dimensional DNA methylation markers as multivariate outcomes. A novel weighted square-root LASSO procedure is proposed to estimate the regression coefficient matrix. A key feature of this method is tuning-insensitivity, which greatly simplifies the computation by obviating cross validation for penalty parameter selection. A precision matrix obtained via the constrained \(\ell_1\) minimization method is used to account for the within-subject correlation among multivariate outcomes. Oracle inequalities of the regularized estimators are derived. The performance of our proposed method is illustrated via extensive simulation studies. We apply our method to study the relation between smoking and high dimensional DNA methylation markers in the Normative Aging Study (NAS).

MSC:

92B15 General biostatistics
62H12 Estimation in multivariate analysis
62J12 Generalized linear models (logistic models)
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

glmnet; HIMA; VIBES; flare; camel
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Basu, S. and G. Michailidis (2015): “Regularized estimation in sparse high-dimensional time series models,” Ann. Stat., 43, 1535-1567.; Basu, S.; Michailidis, G., Regularized estimation in sparse high-dimensional time series models, Ann. Stat, 43, 1535-1567 (2015) · Zbl 1317.62067
[2] Belloni, A., V. Chernozhukov and L. Wang (2011): “Square-root LASSO: pivotal recovery of sparse signals via conic programming,” Biometrika, 98, 791-806.; Belloni, A.; Chernozhukov, V.; Wang, L., Square-root LASSO: pivotal recovery of sparse signals via conic programming, Biometrika, 98, 791-806 (2011) · Zbl 1228.62083
[3] Belloni, A., V. Chernozhukov and L. Wang (2014): “Pivotal estimation via square-root LASSO in nonparametric regression,” Ann. Stat., 42, 757-788.; Belloni, A.; Chernozhukov, V.; Wang, L., Pivotal estimation via square-root LASSO in nonparametric regression, Ann. Stat, 42, 757-788 (2014) · Zbl 1321.62030
[4] Bickel, P. J., Y. Ritov and A. Tsybakov (2009): “Simultaneous analysis of LASSO and Dantzig selector,” Ann. Stat., 37, 1705-1732.; Bickel, P. J.; Ritov, Y.; Tsybakov, A., Simultaneous analysis of LASSO and Dantzig selector, Ann. Stat, 37, 1705-1732 (2009) · Zbl 1173.62022
[5] Bishop, C., D. Spiegelhalter and J. Winn (2003): “VIBES: a variational inference engine for Bayesian networks.” In Advances in Neural Information Processing Systems 15 (S. Becker, S. Thrun and K. Obermayer, eds.). MIT Press, Cambridge, MA, 777-784.; Bishop, C.; Spiegelhalter, D.; Winn, J.; Becker, S.; Thrun, S.; Obermayer, K., VIBES: a variational inference engine for Bayesian networks., Advances in Neural Information Processing Systems 15, MA, 777-784 (2003)
[6] Bradic, J., J. Fan and J. Jiang (2011): “Regularization for Cox’s proportional hazards model with NP-Dimensionality,” Ann. Stat., 39, 3092-3120.; Bradic, J.; Fan, J.; Jiang, J., Regularization for Cox’s proportional hazards model with NP-Dimensionality, Ann. Stat, 39, 3092-3120 (2011) · Zbl 1246.62202
[7] Bühlmann, P. and S. van de Geer (2011): Statistics for high-dimensional data: methods, theory and applications, Springer.; Bühlmann, P.; van de Geer, S., Statistics for high-dimensional data: methods (2011) · Zbl 1273.62015
[8] Cai, T. and M. Yuan (2012): “Adaptive covariance matrix estimation through block thresholding,” Ann. Stat., 40, 2014-2042.; Cai, T.; Yuan, M., Adaptive covariance matrix estimation through block thresholding, Ann. Stat, 40, 2014-2042 (2012) · Zbl 1257.62060
[9] Cai, T., W. Liu and X. Luo (2011): “A constrained \(ℓ_1\) minimization approach to sparse precision matrix estimation,” J. Am. Stat. Assoc., 106, 594-607.; Cai, T.; Liu, W.; Luo, X., A constrained \(ℓ_1\) minimization approach to sparse precision matrix estimation, J. Am. Stat. Assoc, 106, 594-607 (2011) · Zbl 1232.62087
[10] Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348-1360.; Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc, 96, 1348-1360 (2001) · Zbl 1073.62547
[11] Fan, J. and R. Li (2002): “Variable selection for Cox’s proportional hazards model and frailty model,” Ann. Stat., 30, 74-99.; Fan, J.; Li, R., Variable selection for Cox’s proportional hazards model and frailty model, Ann. Stat, 30, 74-99 (2002) · Zbl 1012.62106
[12] Fan, Y. and J. Lv (2014): “Asymptotic properties for combined \(L_1\) and concave regularization,” Biometrika, 101, 57-70.; Fan, Y.; Lv, J., Asymptotic properties for combined \(L_1\) and concave regularization, Biometrika, 101, 57-70 (2014) · Zbl 1285.62074
[13] Fan, J., Y. Liao and M. Mincheva (2013): “Large covariance estimation by thresholding principal orthogonal complements (with discussion),” J. R. Stat. Soci. Ser. B, 75, 603-680.; Fan, J.; Liao, Y.; Mincheva, M., Large covariance estimation by thresholding principal orthogonal complements (with discussion), J. R. Stat. Soci. Ser. B, 75, 603-680 (2013) · Zbl 1411.62138
[14] Friedman, J., T. Hastie and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Software, 33, 1-22.; Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Software, 33, 1-22 (2010) · Zbl 1109.62302
[15] Gao, X., M. Jia, Y. Zhang, L. Breitling and H. Brenner (2015): “DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies,” Clin Epigenet, 7, 113.; Gao, X.; Jia, M.; Zhang, Y.; Breitling, L.; Brenner, H., DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies, Clin Epigenet, 7, 113 (2015)
[16] Huang, J., S. Ma, H. Li and C.-H. Zhang (2011): “The sparse Laplacian shrinkage estimator for high-dimensional regression,” Ann. Stat., 39, 2021-2046.; Huang, J.; Ma, S.; Li, H.; Zhang, C.-H., The sparse Laplacian shrinkage estimator for high-dimensional regression, Ann. Stat, 39, 2021-2046 (2011) · Zbl 1227.62049
[17] Huang, J., T. Sun, Z. Ying, Y. Yu and C.-H. Zhang (2013): “Oracle inequalities for the LASSO in the Cox model,” Ann. Stat., 41, 1142-1165.; Huang, J.; Sun, T.; Ying, Z.; Yu, Y.; Zhang, C.-H., Oracle inequalities for the LASSO in the Cox model, Ann. Stat, 41, 1142-1165 (2013) · Zbl 1292.62135
[18] Jiang, Y., Y. He and H. Zhang (2016): “Variable selection with prior information for generalized linear models via the prior lasso method,” J. Am. Stat. Assoc., 111, 355-376.; Jiang, Y.; He, Y.; Zhang, H., Variable selection with prior information for generalized linear models via the prior lasso method, J. Am. Stat. Assoc, 111, 355-376 (2016)
[19] Laurent, B. and P. Massart (2000): “Adaptive estimation of a quadratic functional by model selection,” Ann. Stat., 28, 1302-1338.; Laurent, B.; Massart, P., Adaptive estimation of a quadratic functional by model selection, Ann. Stat, 28, 1302-1338 (2000) · Zbl 1105.62328
[20] Li, X., T. Zhao, X. Yuan and H. Liu (2015a): “The flare package for high dimensional linear regression and precision matrix estimation in R,” J. Mach. Learn. Res., 16, 553-557.; Li, X.; Zhao, T.; Yuan, X.; Liu, H., The flare package for high dimensional linear regression and precision matrix estimation in R, J. Mach. Learn. Res, 16, 553-557 (2015) · Zbl 1337.62007
[21] Li, Y., B. Nan and J. Zhu (2015b): “Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure,” Biometrics, 71, 354-363.; Li, Y.; Nan, B.; Zhu, J., Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure, Biometrics, 71, 354-363 (2015) · Zbl 1390.62285
[22] Lin, W. and J. Lv (2013): “High-dimensional sparse additive hazards regression,” J. Am. Stat. Assoc., 108, 247-264.; Lin, W.; Lv, J., High-dimensional sparse additive hazards regression, J. Am. Stat. Assoc, 108, 247-264 (2013) · Zbl 06158340
[23] Lin, W., R. Feng and H. Li (2015): “Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics,” J. Am. Stat. Assoc., 110, 270-288.; Lin, W.; Feng, R.; Li, H., Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics, J. Am. Stat. Assoc, 110, 270-288 (2015) · Zbl 1373.62371
[24] Liu, H. and L. Wang (2017): “TIGER: a tuning-insensitive approach for optimally estimating Gaussian graphical models,” Electron. J. Stat., 11, 241-294.; Liu, H.; Wang, L., TIGER: a tuning-insensitive approach for optimally estimating Gaussian graphical models, Electron. J. Stat, 11, 241-294 (2017) · Zbl 1395.62007
[25] Liu, H., L. Wang and T. Zhao (2015): “Calibrated multivariate regression with application to neural semantic basis discovery,” J. Mach. Learn. Res., 16, 1579-1606.; Liu, H.; Wang, L.; Zhao, T., Calibrated multivariate regression with application to neural semantic basis discovery, J. Mach. Learn. Res, 16, 1579-1606 (2015) · Zbl 1351.62135
[26] Moen, E., X. Zhang, W. Mu, S. Delaney, C. Wing, J. McQuade, J. Myers, L. Godley, M. Dolan and W. Zhang (2013): “Genome-wide variation of cytosine modifications between European and African populations and the implications for complex traits,” Genetics, 194, 987-996.; Moen, E.; Zhang, X.; Mu, W.; Delaney, S.; Wing, C.; McQuade, J.; Myers, J.; Godley, L.; Dolan, M.; Zhang, W., Genome-wide variation of cytosine modifications between European and African populations and the implications for complex traits, Genetics, 194, 987-996 (2013)
[27] Mukherjee, R., N. Pillai and X. Lin (2015): “Hypothesis testing for high-dimensional sparse binary regression,” Ann. Stat., 43, 352-381.; Mukherjee, R.; Pillai, N.; Lin, X., Hypothesis testing for high-dimensional sparse binary regression, Ann. Stat, 43, 352-381 (2015) · Zbl 1308.62094
[28] Rothman, A., E. Levina and J. Zhu (2010): “Sparse multivariate regression with covariance estimation,” J. Comput. Graph. Stat., 19, 947-962.; Rothman, A.; Levina, E.; Zhu, J., Sparse multivariate regression with covariance estimation, J. Comput. Graph. Stat, 19, 947-962 (2010) · Zbl 1195.62089
[29] Sofer, T., L. Dicker and X. Lin (2014): “Variable selection for high dimensional multivariate outcomes,” Stat. Sinica, 24, 1633-1654.; Sofer, T.; Dicker, L.; Lin, X., Variable selection for high dimensional multivariate outcomes, Stat. Sinica, 24, 1633-1654 (2014) · Zbl 1480.62048
[30] Tibshirani, R. (1996): “Regression shrinkage and selection via the LASSO,” J. Royal Stat. Soci. Ser. B, 58, 267-288.; Tibshirani, R., Regression shrinkage and selection via the LASSO, J. Royal Stat. Soci. Ser. B, 58, 267-288 (1996) · Zbl 0850.62538
[31] van de Geer, S. (2008): “High-dimensional generalized linear models and the lasso,” Ann. Stat., 36, 614-645.; van de Geer, S., High-dimensional generalized linear models and the lasso, Ann. Stat, 36, 614-645 (2008) · Zbl 1138.62323
[32] Wang, H. and C. Leng (2007): “Unified LASSO estimation by least squares approximation,” J. Am. Stat. Assoc., 102, 1039-1048.; Wang, H.; Leng, C., Unified LASSO estimation by least squares approximation, J. Am. Stat. Assoc, 102, 1039-1048 (2007) · Zbl 1306.62167
[33] Wilms, I. and C. Croux (2017): “An algorithm for the multivariate group lasso with covariance estimation,” J. Appl. Stat., accepted.; Wilms, I.; Croux, C., An algorithm for the multivariate group lasso with covariance estimation, J. Appl. Stat (2017) · Zbl 1516.62663
[34] Yuan, M. and Y. Lin (2007): “Model selection and estimation in the Gaussian graphical model,” Biometrika, 95, 19-35.; Yuan, M.; Lin, Y., Model selection and estimation in the Gaussian graphical model, Biometrika, 95, 19-35 (2007) · Zbl 1142.62408
[35] Zhang, C.-H. (2010): “Nearly unbiased variable selection under minimax concave penalty,” Ann. Stat., 38, 894-942.; Zhang, C.-H., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat, 38, 894-942 (2010) · Zbl 1183.62120
[36] Zhang, H., Y. Zheng, Z. Zhang, T. Gao, B. Joyce, G. Yoon, W. Zhang, J. Schwartz, A. Just, E. Colicino, P. Vokonas, L. Zhao, J. Lv, A. Baccarelli, L. Hou and L. Liu (2016): “Estimating and testing high-dimensional mediation effects in epigenetic studies,” Bioinformatics, 32, 3150-3154.; Zhang, H.; Zheng, Y.; Zhang, Z.; Gao, T.; Joyce, B.; Yoon, G.; Zhang, W.; Schwartz, J.; Just, A.; Colicino, E.; Vokonas, P.; Zhao, L.; Lv, J.; Baccarelli, A.; Hou, L.; Liu, L., Estimating and testing high-dimensional mediation effects in epigenetic studies, Bioinformatics, 32, 3150-3154 (2016) · Zbl 1371.92014
[37] Zou, H. (2006): “The adaptive LASSO and its oracle properties,” J. Am. Stat. Assoc., 101, 1418-1429.; Zou, H., The adaptive LASSO and its oracle properties, J. Am. Stat. Assoc, 101, 1418-1429 (2006) · Zbl 1171.62326
[38] Zou, H. and T. Hastie (2005): “Regularization and vriable selection via the elastic net,” J. Royal Stat. Soci. Ser. B, 67, 301-320.; Zou, H.; Hastie, T., Regularization and vriable selection via the elastic net, J. Royal Stat. Soci. Ser. B, 67, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.