×

Posterior contraction rates of the phylogenetic Indian buffet processes. (English) Zbl 1357.62150

Summary: By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study the theoretical properties of IBP and pIBP under a binary factor model. We establish the posterior contraction rates for both IBP and pIBP and substantiate the theoretical results through simulation studies. This is the first work addressing the frequentist property of the posterior behaviors of IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP prior to a real data example arising in the field of cancer genomics where the exchangeability among subjects is violated.

MSC:

62G05 Nonparametric estimation
62F15 Bayesian inference
62H25 Factor analysis and principal components; correspondence analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Arriola, E., Marchio, C., Tan, D. S., Drury, S. C., Lambros, M. B., Natrajan, R., Rodriguez-Pinilla, S. M., Mackay, A., Tamber, N., Fenwick, K., et al. (2008). “Genomic analysis of the HER2/TOP2A amplicon in breast cancer and breast cancer cell lines.” Laboratory Investigation , 88(5):491-503.
[2] Barron, A., Schervish, M. J., and Wasserman, L. (1999). “The consistency of posterior distributions in nonparametric problems.” The Annals of Statistics , 27(2):536-561. · Zbl 0980.62039 · doi:10.1214/aos/1018031206
[3] Bell, D., Berchuck, A., Birrer, M., Chien, J., Cramer, D., Dao, F., Dhir, R., DiSaia, P., Gabra, H., Glenn, P., et al. (2011). “Integrated genomic analyses of ovarian carcinoma.” Nature , 474:609-615.
[4] Birnbaum, A., Johnstone, I. M., Nadler, B., and Paul, D. (2013). “Minimax bounds for sparse PCA with noisy high-dimensional data.” The Annals of Statistics , 41(3):1055-1084. · Zbl 1292.62071 · doi:10.1214/12-AOS1014
[5] Cai, T., Ma, Z., and Wu, Y. (2015). “Optimal estimation and rank detection for sparse spiked covariance matrices.” Probability Theory and Related Fields , 161(3-4):781-815. · Zbl 1314.62130 · doi:10.1007/s00440-014-0562-z
[6] Cai, T. T., Ma, Z., and Wu, Y. (2013). “Sparse PCA: Optimal rates and adaptive estimation.” The Annals of Statistics , 41(6):3074-3110. · Zbl 1288.62099 · doi:10.1214/13-AOS1178
[7] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2008). “High-dimensional sparse factor modeling: applications in gene expression genomics.” Journal of the American Statistical Association , 103(484):1438-1456. · Zbl 1286.62091 · doi:10.1198/016214508000000869
[8] Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., Jacobsen, A., Byrne, C. J., Heuer, M. L., Larsson, E., et al. (2012). “The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.” Cancer discovery , 2(5):401-404.
[9] Chène, P. (2003). “Inhibiting the p53-MDM2 interaction: an important target for cancer therapy.” Nature Reviews Cancer , 3(2):102-109.
[10] Chen, M., Gao, C., and Zhao, H. (2015). “Supplement to “Posterior Contraction Rates of the Phylogenetic Indian Buffet Processes”.” Bayesian Analysis . · Zbl 1357.62150 · doi:10.1214/15-BA958
[11] Diaconis, P. and Freedman, D. (1986). “On the consistency of Bayes estimates.” The Annals of Statistics , 14(1):1-26. · Zbl 0595.62022 · doi:10.1214/aos/1176349830
[12] Fan, J., Fan, Y., and Lv, J. (2008). “High dimensional covariance matrix estimation using a factor model.” Journal of Econometrics , 147(1):186-197. · Zbl 1429.62185 · doi:10.1016/j.jeconom.2008.09.017
[13] Fan, J., Liao, Y., and Mincheva, M. (2011). “High dimensional covariance matrix estimation in approximate factor models.” The Annals of Statistics , 39(6):3320-3356. · Zbl 1246.62151 · doi:10.1214/11-AOS944
[14] - (2013). “Large covariance estimation by thresholding principal orthogonal complements.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 75(4):603-680. · doi:10.1111/rssb.12016
[15] Gao, C. and Zhou, H. H. (2015). “Rate-optimal posterior contraction for sparse PCA.” The Annals of Statistics, 43(2):785-818 . · Zbl 1312.62078 · doi:10.1214/14-AOS1268
[16] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). “Convergence rates of posterior distributions.” The Annals of Statistics , 28(2):500-531. · Zbl 1105.62315 · doi:10.1214/aos/1016218228
[17] Griffiths, T. L. and Ghahramani, Z. (2005). “Infinite Latent Feature Models and the Indian Buffet Process.” In: NIPS , 475-482. MIT Press.
[18] - (2011). “The Indian buffet process: An introduction and review.” Journal of Machine Learning Research , 12:1185-1224. · Zbl 1280.62038
[19] Hu, G., Chong, R. A., Yang, Q., Wei, Y., Blanco, M. A., Li, F., Reiss, M., Au, J. L.-S., Haffty, B. G., and Kang, Y. (2009). “MTDH Activation by 8q22 Genomic Gain Promotes Chemoresistance and Metastasis of Poor-Prognosis Breast Cancer.” Cancer cell , 15(1):9-20.
[20] Killian, A., Sarafan-Vasseur, N., Sesboüé, R., Le Pessot, F., Blanchard, F., Lamy, A., Laurent, M., Flaman, J.-M., and Frébourg, T. (2006). “Contribution of the BOP1 gene, located on 8q24, to colorectal tumorigenesis.” Genes, Chromosomes and Cancer , 45(9):874-881.
[21] Knowles, D. and Ghahramani, Z. (2011). “Nonparametric Bayesian sparse factor models with application to gene expression modeling.” The Annals of Applied Statistics , 5(2B):1534-1552. · Zbl 1223.62013 · doi:10.1214/10-AOAS435
[22] Le Cam, L. and Yang, G. L. (2000). Asymptotics in Statistics: Some Basic Concepts . Springer. · Zbl 0952.62002
[23] Lv, Q., Wang, W., Xue, J., Hua, F., Mu, R., Lin, H., Yan, J., Lv, X., Chen, X., and Hu, Z.-W. (2012). “DEDD Interacts with PI3KC3 to Activate Autophagy and Attenuate Epithelial-Mesenchymal Transition in Human Breast Cancer.” Cancer Research , 72(13):3238-3250.
[24] Miller, J. W. and Harrison, M. T. (2013). “Inconsistency of Pitman-Yor process mixtures for the number of components.” . arXiv:1309.0024 · Zbl 1319.62100
[25] Miller, K. T., Griffiths, T., and Jordan, M. I. (2012). “The phylogenetic Indian buffet process: A non-exchangeable nonparametric prior for latent features.” . arXiv:1206.3279
[26] Muzny, D. M., Bainbridge, M. N., Chang, K., Dinh, H. H., Drummond, J. A., Fowler, G., Kovar, C. L., Lewis, L. R., Morgan, M. B., Newsham, I. F., et al. (2012). “Comprehensive molecular characterization of human colon and rectal cancer.” Nature , 487:330-337.
[27] Nik-Zainal, S., Van Loo, P., Wedge, D. C., Alexandrov, L. B., Greenman, C. D., Lau, K. W., Raine, K., Jones, D., Marshall, J., Ramakrishna, M., et al. (2012). “The life history of 21 breast cancers.” Cell , 149(5):994-1997.
[28] Pati, D., Bhattacharya, A., Pillai, N. S., and Dunson, D. (2014). “Posterior contraction in sparse Bayesian factor models for massive covariance matrices.” The Annals of Statistics , 42(3):1102-1130. · Zbl 1305.62124 · doi:10.1214/14-AOS1215
[29] Ramakrishna, M., Williams, L. H., Boyle, S. E., Bearfoot, J. L., Sridhar, A., Speed, T. P., Gorringe, K. L., and Campbell, I. G. (2010). “Identification of candidate growth promoting genes in ovarian cancer through integrated copy number and expression analysis.” PLoS One , 5(4):e9983.
[30] Rousseau, J., Têtu, B., Caron, D., Malenfant, P., Cattaruzzi, P., Audette, M., Doillon, C., Tremblay, J. P., and Guérette, B. (2002). “RCAS1 is associated with ductal breast cancer progression.” Biochemical and biophysical research communications , 293(5):1544-1549.
[31] Schwartz, L. (1965). “On Bayes procedures.” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete , 4(1):10-26. · Zbl 0158.17606 · doi:10.1007/BF00535479
[32] TCGA (2012). “Comprehensive molecular portraits of human breast tumours.” Nature , 490:61-70.
[33] Teh, Y. W., Görür, D., and Ghahramani, Z. (2007). “Stick-breaking construction for the Indian buffet process.” In: Proceedings of the International Conference on Artificial Intelligence and Statistics , volume 11.
[34] Van De Vijver, M. J., He, Y. D., van’t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., et al. (2002). “A gene-expression signature as a predictor of survival in breast cancer.” New England Journal of Medicine , 347(25):1999-2009.
[35] Vu, V. Q. and Lei, J. (2013). “Minimax sparse principal subspace estimation in high dimensions.” The Annals of Statistics , 41(6):2905-2947. · Zbl 1288.62103 · doi:10.1214/13-AOS1151
[36] Wrzeszczynski, K. O., Varadan, V., Byrnes, J., Lum, E., Kamalakaran, S., Levine, D. A., Dimitrova, N., Zhang, M. Q., and Lucito, R. (2011). “Identification of tumor suppressors and oncogenes from genomic and epigenetic features in ovarian cancer.” PLoS One , 6(12):e28503.
[37] Zhang, D., Jiang, P., Xu, Q., Zhang, X., Zhang, D., Jiang, P., Xu, Q., and Zhang, X. (2011). “Arginine and glutamate-rich 1 (ARGLU1) interacts with mediator subunit 1 (MED1) and is required for estrogen receptor-mediated gene transcription and breast cancer cell growth.” Journal of Biological Chemistry , 286(20):17746-17754.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.