×

Bayesian model selection with graph structured sparsity. (English) Zbl 1508.62010

Summary: We propose a general algorithmic framework for Bayesian model selection. A spike-and-slab Laplacian prior is introduced to model the underlying structural assumption. Using the notion of effective resistance, we derive an EM-type algorithm with closed-form iterations to efficiently explore possible candidates for Bayesian model selection. The deterministic nature of the proposed algorithm makes it more scalable to large-scale and high-dimensional data sets compared with existing stochastic search algorithms. When applied to sparse linear regression, our framework recovers the EMVS algorithm by V. Ročková and E. I. George [J. Am. Stat. Assoc. 109, No. 506, 828–846 (2014; Zbl 1367.62049)] as a special case. We also discuss extensions of our framework using tools from graph algebra to incorporate complex Bayesian models such as biclustering and submatrix localization. Extensive simulation studies and real data applications are conducted to demonstrate the superior performance of our methods over its frequentist competitors such as \(\ell_0\) or \(\ell_1\) penalization.

MSC:

62C10 Bayesian problems; characterization of Bayes procedures
68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models

Citations:

Zbl 1367.62049
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Taylor B Arnold and Ryan J Tibshirani. Genlasso: path algorithm for generalized lasso problems.R package version, 1, 2014.
[2] Andrew Barron, Lucien Birg´e, and Pascal Massart. Risk bounds for model selection via penalization.Probability theory and related fields, 113(3):301-413, 1999. · Zbl 0946.62036
[3] Daniel Barry and John A Hartigan. A Bayesian analysis for change point problems.Journal of the American Statistical Association, 88(421):309-319, 1993. · Zbl 0775.62065
[4] Arindam Bhattacharjee, William G Richards, Jane Staunton, Cheng Li, Stefano Monti, Priya Vasa, Christine Ladd, Javad Beheshti, Raphael Bueno, and Michael Gillette. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses.Proceedings of the National Academy of Sciences, 98(24):13790-13795, 2001.
[5] Howard D Bondell and Brian J Reich. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar.Biometrics, 64(1):115-123, 2008. · Zbl 1146.62051
[6] Leonard Bottolo and Sylvia Richardson. Evolutionary stochastic search for Bayesian model exploration.Bayesian Analysis, 5(3):583-618, 2010. · Zbl 1330.90042
[7] Oleg Burdakov and Oleg Sysoev. A dual active-set algorithm for regularized monotonic regression.Journal of Optimization Theory and Applications, 172(3):929-949, 2017. · Zbl 1362.90307
[8] Y. Cheng and G.M. Church. Biclustering of expression data. InProceedings of the eighth international conference on intelligent systems for molecular biology, pages 93-103, 2000.
[9] Eric C Chi, Genevera I Allen, and Richard G Baraniuk. Convex biclustering.Biometrics, 73(1):10-19, 2017. · Zbl 1366.62208
[10] Patrick L Combettes and Jean-Christophe Pesquet. Proximal splitting methods in signal processing. InFixed-point algorithms for inverse problems in science and engineering, pages 185-212. Springer, 2011. · Zbl 1242.90160
[11] Arthur P Dempster, Nan M Laird, and Donald B Rubin.Maximum likelihood from incomplete data via the EM algorithm.Journal of the royal statistical society. Series B
[12] Jean Diebolt and Christian P Robert. Estimation of finite mixture distributions through Bayesian sampling.Journal of the Royal Statistical Society. Series B (Methodological), pages 363-375, 1994. · Zbl 0796.62028
[13] Richard L Dykstra. An algorithm for restricted least squares regression.Journal of the American Statistical Association, 78(384):837-842, 1983. · Zbl 0535.62063
[14] Zhou Fan and Leying Guan. Approximate‘0-penalized estimation of piecewise-constant signals on graphs.The Annals of Statistics, 46(6B):3217-3245, 2018. · Zbl 1456.62052
[15] Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph.Random Structures & Algorithms, 16(2):195-208, 2000. · Zbl 0940.05050
[16] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of statistical software, 33(1):1, 2010.
[17] Felix Friedrich, Angela Kempe, Volkmar Liebscher, and Gerhard Winkler. Complexity penalized M-estimation: Fast computation.Journal of Computational and Graphical Statistics, 17(1):201-224, 2008.
[18] Chao Gao, Aad W van der Vaart, and Harrison H Zhou. A general framework for Bayes structured linear models.arXiv preprint arXiv:1506.02174, 2015.
[19] Chao Gao, Yu Lu, Zongming Ma, and Harrison H Zhou. Optimal estimation and completion of matrices with biclustering structures.The Journal of Machine Learning Research, 17 · Zbl 1392.62151
[20] Chao Gao, Fang Han, and Cun-Hui Zhang. On estimation of isotonic piecewise constant signals.The Annals of Statistics, to appear, 2018.
[21] Edward I George and Robert E McCulloch. Variable selection via Gibbs sampling.Journal of the American Statistical Association, 88(423):881-889, 1993.
[22] Edward I George and Robert E McCulloch. Approaches for Bayesian variable selection. Statistica sinica, pages 339-373, 1997. · Zbl 0884.62031
[23] Arpita Ghosh, Stephen Boyd, and Amin Saberi. Minimizing effective resistance of a graph. SIAM review, 50(1):37-66, 2008. · Zbl 1134.05057
[24] Joyee Ghosh and Merlise A Clyde. Rao-blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. · Zbl 1229.62029
[25] G´erard Govaert and Mohamed Nadif. Clustering with block mixture models.Pattern Recognition, 36(2):463-473, 2003.
[26] Bruce Hajek, Yihong Wu, and Jiaming Xu. Submatrix localization via message passing. The Journal of Machine Learning Research, 18(1):6817-6868, 2017. · Zbl 1468.68155
[27] David Hallac, Jure Leskovec, and Stephen Boyd. Network lasso: clustering and optimization in large graphs. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 387-396. ACM, 2015.
[28] Chris Hans, Adrian Dobra, and Mike West. Shotgun stochastic search for large p regression. Journal of the American Statistical Association, 102(478):507-516, 2007. · Zbl 1134.62398
[29] John A Hartigan. Direct clustering of a data matrix.Journal of the American Statistical Association, 67(337):123-129, 1972.
[30] Wilfried Imrich and Sandi Klavzar.Product Graphs: Structure and Recognition. Wiley, 2000. · Zbl 0963.05002
[31] Chinubhai G Khatri. Some results for the singular normal multivariate regression models. Sankhy¯a: The Indian Journal of Statistics, Series A, pages 267-280, 1968. · Zbl 0177.22802
[32] Mihee Lee, Haipeng Shen, Jianhua Z Huang, and JS Marron. Biclustering via sparse singular value decomposition.Biometrics, 66(4):1087-1095, 2010. · Zbl 1233.62182
[33] Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: an approach to modeling networks.Journal of Machine · Zbl 1242.05256
[34] Fan Li and Nancy R Zhang. Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics.Journal of the American Statistical · Zbl 1390.62027
[35] Oren E Livne and Achi Brandt. Lean algebraic multigrid (lamg): fast graph laplacian linear solver.SIAM Journal on Scientific Computing, 34(4):B499-B522, 2012. · Zbl 1253.65045
[36] L´aszl´o Lov´asz. Random walks on graphs: a survey.Combinatorics, Paul erdos is eighty, 2 (1):1-46, 1993.
[37] Zongming Ma and Yihong Wu. Volume ratio, sparsity, and minimaxity under unitarily invariant norms.IEEE Transactions on Information Theory, 61(12):6939-6956, 2015. · Zbl 1359.94135
[38] Patrick Mair, Kurt Hornik, and Jan de Leeuw. Isotone optimization in R: pool-adjacentviolators algorithm (pava) and active set methods.Journal of statistical software, 32(5): 1-24, 2009.
[39] Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023-1032, 1988. · Zbl 0673.62051
[40] Radford M Neal and Geoffrey E Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. InLearning in graphical models, pages 355-368. Springer, 1998. · Zbl 0916.62019
[41] Paulino P´erez and Gustavo de Los Campos. Genome-wide regression and prediction with the BGLR statistical package.Genetics, 198(2):483-495, 2014.
[42] Sylvia Richardson and Peter J Green. On Bayesian analysis of mixtures with an unknown number of components (with discussion).Journal of the Royal Statistical Society: series · Zbl 0891.62020
[43] Veronika Roˇckov´a and Edward I George. EMVS: the EM approach to Bayesian variable selection.Journal of the American Statistical Association, 109(506):828-846, 2014. · Zbl 1367.62049
[44] Veronika Roˇckov´a and Edward I George. The spike-and-slab lasso.Journal of the American Statistical Association, 113(521):431-444, 2018. · Zbl 1398.62186
[45] Michael J Schell and Bahadur Singh. The reduced monotonic regression method.Journal of the American Statistical Association, 92(437):128-135, 1997. · Zbl 0890.62035
[46] Yiyuan She. Sparse regression with exact clustering.Electronic Journal of Statistics, 4: 1055-1096, 2010. · Zbl 1329.62327
[47] Martin Sill, Sebastian Kaiser, Axel Benner, and Annette Kopp-Schneider. Robust biclustering by sparse singular value decomposition incorporating stability selection.Bioinformatics, 27(15):2089-2097, 2011.
[48] Qifan Song and Guang Cheng. Optimal false discovery control of minimax estimator.arXiv preprint arXiv:1812.10013, 2018.
[49] Wang Songgui and Chow Shein-Chung. Advanced linear models: Theory and applications, 1994.
[50] Daniel A Spielman. Spectral graph theory and its applications. InFoundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on, pages 29-38. IEEE, 2007.
[51] Daniel A Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. InProceedings of the thirty-sixth annual · Zbl 1192.65048
[52] Matthew Stephens. False discovery rates: a new deal.Biostatistics, 18(2):275-294, 2016.
[53] Kean Ming Tan and Daniela M Witten. Sparse biclustering of transposable data.Journal of Computational and Graphical Statistics, 23(4):985-1008, 2014.
[54] Terence Tao.Topics in Random Matrix Theory, volume 132. American Mathematical Soc., 2012. · Zbl 1256.15020
[55] Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. Sparsity and smoothness via the fused lasso.Journal of the Royal Statistical Society: Series B · Zbl 1060.62049
[56] Ryan J Tibshirani and Jonathan Taylor. The solution path of the generalized lasso.The Annals of Statistics, 39(3):1335-1371, 2011. · Zbl 1234.62107
[57] Ryan J Tibshirani, Holger Hoefling, and Robert Tibshirani. Nearly-isotonic regression. Technometrics, 53(1):54-61, 2011.
[58] Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference.Foundations and TrendsRin Machine Learning, 1(1-2):1-305, 2008. · Zbl 1193.62107
[59] Gao Wang, Abhishek K Sarkar, Peter Carbonetto, and Matthew Stephens. A simple new approach to variable selection in regression, with application to genetic fine-mapping. · Zbl 07554792
[60] Daniela M Witten and Robert Tibshirani. A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490):713-726, 2010. · Zbl 1392.62194
[61] Wei Biao Wu, Michael Woodroofe, and Graciela Mentz. Isotonic regression: another look at the changepoint problem.Biometrika, 88(3):793-804, 2001. · Zbl 0985.62076
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.