×

Variable selection via penalized credible regions with Dirichlet-Laplace global-local shrinkage priors. (English) Zbl 1407.62272

Summary: The method of Bayesian variable selection via penalized credible regions separates model fitting and variable selection. The idea is to search for the sparsest solution within the joint posterior credible regions. Although the approach was successful, it depended on the use of conjugate normal priors. More recently, improvements in the use of global-local shrinkage priors have been made for high-dimensional Bayesian variable selection. In this paper, we incorporate global-local priors into the credible region selection framework. The Dirichlet-Laplace (DL) prior is adapted to linear regression. Posterior consistency for the normal and DL priors are shown, along with variable selection consistency. We further introduce a new method to tune hyperparameters in prior distributions for linear regression. We propose to choose the hyperparameters to minimize a discrepancy between the induced distribution on R-square and a prespecified target distribution. Prior elicitation on R-square is more natural, particularly when there are a large number of predictor variables in which elicitation on that scale is not feasible. For a normal prior, these hyperparameters are available in closed form to minimize the Kullback-Leibler divergence between the distributions.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62F15 Bayesian inference
62J05 Linear regression; mixed models

Software:

OSCAR
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Akaike, H. (1973). “Information theory and an extension of the maximum likelihood principle.” In Selected Papers of Hirotugu Akaike, 199–213. Springer. · Zbl 0283.62006
[2] Arias-Castro, E., Lounici, K., et al. (2014). “Estimation and variable selection with exponential weights.” Electronic Journal of Statistics, 8(1): 328–354. · Zbl 1294.62164 · doi:10.1214/14-EJS883
[3] Armagan, A., Dunson, D. B., and Lee, J. (2013a). “Generalized double Pareto shrinkage.” Statistica Sinica, 23(1): 119. · Zbl 1259.62061
[4] Armagan, A., Dunson, D. B., Lee, J., Bajwa, W. U., and Strawn, N. (2013b). “Posterior consistency in linear models under shrinkage priors.” Biometrika, 100(4): 1011–1018. · Zbl 1279.62139 · doi:10.1093/biomet/ast028
[5] Bhadra, A., Datta, J., Polson, N. G., Willard, B., et al. (2016). “The horseshoe+ estimator of ultra-sparse signals.” Bayesian Analysis. · Zbl 1384.62079 · doi:10.1214/16-BA1028
[6] Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet–Laplace priors for optimal shrinkage.” Journal of the American Statistical Association, 110(512): 1479–1490. · Zbl 1373.62368 · doi:10.1080/01621459.2014.960967
[7] Bondell, H. D. and Reich, B. J. (2008). “Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.” Biometrics, 64(1): 115–123. · Zbl 1146.62051 · doi:10.1111/j.1541-0420.2007.00843.x
[8] Bondell, H. D. and Reich, B. J. (2012). “Consistent high-dimensional Bayesian variable selection via penalized credible regions.” Journal of the American Statistical Association, 107(500): 1610–1624. · Zbl 1258.62026 · doi:10.1080/01621459.2012.716344
[9] Candes, E. and Tao, T. (2007). “The Dantzig selector: Statistical estimation when p is much larger than n.” The Annals of Statistics, 2313–2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[10] Carvalho, C. M., Polson, N. G., and Scott, J. G. (2009). “Handling sparsity via the horseshoe.” In International Conference on Artificial Intelligence and Statistics, 73–80.
[11] Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). “The Horseshoe estimator for sparse signals.” Biometrika, asq017. · Zbl 1406.62021 · doi:10.1093/biomet/asq017
[12] Castillo, I., Schmidt-Hieber, J., Van der Vaart, A., et al. (2015). “Bayesian linear regression with sparse priors.” The Annals of Statistics, 43(5): 1986–2018. · Zbl 1486.62197 · doi:10.1214/15-AOS1334
[13] Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). “Least angle regression.” The Annals of Statistics, 32(2): 407–499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[14] Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle properties.” Journal of the American Statistical Association, 96(456): 1348–1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[15] George, E. and Foster, D. P. (2000). “Calibration and empirical Bayes variable selection.” Biometrika, 87(4): 731–747. · Zbl 1029.62008 · doi:10.1093/biomet/87.4.731
[16] George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.” Journal of the American Statistical Association, 88(423): 881–889.
[17] Griffin, J. E., Brown, P. J., et al. (2010). “Inference with normal-gamma prior distributions in regression problems.” Bayesian Analysis, 5(1): 171–188. · Zbl 1330.62128 · doi:10.1214/10-BA507
[18] Hans, C. (2010). “Model uncertainty and variable selection in Bayesian lasso regression.” Statistics and Computing, 20(2): 221–229.
[19] Ishwaran, H. and Rao, J. S. (2005). “Spike and slab variable selection: frequentist and Bayesian strategies.” Annals of Statistics, 730–773. · Zbl 1068.62079 · doi:10.1214/009053604000001147
[20] Lan, H., Chen, M., Flowers, J. B., Yandell, B. S., Stapleton, D. S., Mata, C. M., Mui, E., Flowers, M. T., Schueler, K. L., Manly, K. F., et al. (2006). “Combined expression trait correlations and expression quantitative trait locus mapping.” PLoS Genet, 2(1): e6.
[21] Leng, C., Tran, M.-N., and Nott, D. (2014). “Bayesian adaptive lasso.” Annals of the Institute of Statistical Mathematics, 66(2): 221–244. · Zbl 1334.62130 · doi:10.1007/s10463-013-0429-6
[22] Li, Q., Lin, N., et al. (2010). “The Bayesian elastic net.” Bayesian Analysis, 5(1): 151–170. · Zbl 1330.65026 · doi:10.1214/10-BA506
[23] Lv, J. and Fan, Y. (2009). “A unified approach to model selection and sparse recovery using regularized least squares.” The Annals of Statistics, 3498–3528. · Zbl 1369.62156 · doi:10.1214/09-AOS683
[24] Martin, R., Mess, R., Walker, S. G., et al. (2017). “Empirical Bayes posterior concentration in sparse high-dimensional linear models.” Bernoulli, 23(3): 1822–1847. · Zbl 1450.62085 · doi:10.3150/15-BEJ797
[25] Park, T. and Casella, G. (2008). “The bayesian lasso.” Journal of the American Statistical Association, 103(482): 681–686. · Zbl 1330.62292 · doi:10.1198/016214508000000337
[26] Polson, N. G. and Scott, J. G. (2010). “Shrink globally, act locally: Sparse Bayesian regularization and prediction.” Bayesian Statistics, 9: 501–538.
[27] Polson, N. G., Scott, J. G., and Windle, J. (2013). “The Bayesian Bridge.” Journal of the Royal Statistical Society: Series B (Statistical Methodology). · Zbl 1283.62055 · doi:10.1111/rssb.12042
[28] Schwarz, G. et al. (1978). “Estimating the dimension of a model.” The annals of statistics, 6(2): 461–464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[29] Stroeker, R. (1983). “Approximations of the eigenvalues of the covariance matrix of a first-order autoregressive process.” Journal of Econometrics, 22(3): 269–279. · Zbl 0487.62075 · doi:10.1016/0304-4076(83)90103-3
[30] Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society. Series B (Methodological), 267–288. · Zbl 0850.62538
[31] Zhang, Y. and Bondell, H. D. (2017). “Supplementary material of “Variable selection via penalized credible regions with Dirichlet–Laplace global-local shrinkage priors”.” Bayesian Analysis.
[32] Zou, H. (2006). “The adaptive lasso and its oracle properties.” Journal of the American Statistical Association, 101(476): 1418–1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[33] Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.