Nonparametric Bayesian sparse factor models with application to gene expression modeling. (English) Zbl 1223.62013

Summary: A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data \(\mathbf Y\) is modeled as a linear superposition, \(\mathbf G\), of a potentially infinite number of hidden factors, \(\mathbf X\). The Indian Buffet Process (IBP) is used as a prior on \(\mathbf G\) to incorporate sparsity and to allow the number of latent features to be inferred. The model’s utility for modeling gene expression data is investigated using randomly generated data sets based on a known sparse connectivity matrix for E. Coli, and on three biological data sets of increasing complexity.


62F15 Bayesian inference
62H25 Factor analysis and principal components; correspondence analysis
62G99 Nonparametric inference
65C40 Numerical analysis or methods applied to Markov chains


Full Text: DOI arXiv


[1] Archambeau, C. and Bach, F. (2009). Sparse probabilistic projections. In Proceedings of the Conference on Neural Information Processing Systems (NIPS) ( D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 73-80. MIT Press, Cambridge, MA.
[2] Courville, A. C., Eck, D. and Bengio, Y. (2009). An infinite factor model hierarchy via a noisy-or mechanism. In Advances in Neural Information Processing Systems 21 . MIT Press, Cambridge, MA.
[3] Doshi-Velez, F. and Ghahramani, Z. (2009). Correlated non-parametric latent feature models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence 143-150. AUAI Press, Arlington, VA.
[4] Fevotte, C. and Godsill, S. J. (2006). A Bayesian approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech, and Language Processing 14 2174-2188.
[5] Fokoue, E. (2004). Stochastic determination of the intrinsic structure in Bayesian factor analysis. Technical Report No. 17, Statistical and Applied Mathematical Sciences Institute.
[6] Griffiths, T. L. and Ghahramani, Z. (2006). Infinite latent feature models and the Indian Buffet Process. In Advances in Neural Information Processing Systems 18 . MIT Press, Cambridge, MA.
[7] Kao, K. C., Yang, Y.-L., Boscolo, R., Sabatti, C., Roychowdhury, V. and Liao, J. C. (2004). Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. In Proceedings of the National Academy of Sciences of the United States of America (PNAS) 101 641-646. Natl. Acad. Sci., Washington, DC.
[8] Kaufman, G. M. and Press, S. J. (1973). Bayesian factor analysis. Technical Report No. 662-73, Sloan School of Management, Univ. Chicago.
[9] Knowles, D. and Ghahramani, Z. (2007). Infinite sparse factor analysis and infinite independent components analysis. In 7th International Conference on Independent Component Analysis and Signal Separation 381-388. Springer, Berlin. · Zbl 1173.94367
[10] Meeds, E., Ghahramani, Z., Neal, R. and Roweis, S. (2006). Modeling dyadic data with binary latent factors. In Neural Information Processing Systems 19 . MIT Press, Cambridge, MA.
[11] Rai, P. and Daumé III, H. (2008). The infinite hierarchical factor regression model. In Neural Information Processing Systems . MIT Press, Cambridge, MA.
[12] Rowe, D. B. and Press, S. J. (1998). Gibbs sampling and hill climbing in Bayesian factor analysis. Technical Report No. 255, Dept. Statistics, Univ. California, Riverside.
[13] West, M., Chang, J., Lucas, J., Nevins, J. R., Wang, Q. and Carvalho, C. (2007). High-dimensional sparse factor modelling: Applications in gene expression genomics. Technical report, ISDS, Duke Univ. · Zbl 1286.62091
[14] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515-534.
[15] Yu, Y. P., Landsittel, D., Jing, L., Nelson, J., Ren, B., Liu, L., McDonald, C., Thomas, R., Dhir, R., Finkelstein, S., Michalopoulos, G., Becich, M. and Luo, J.-H. (2004). Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. Journal of Clinical Oncology 22 2790-2799.
[16] Zhang, Z., Chan, K. L., Kwok, J. T. and yan Yeung, D. (2004). Bayesian inference on principal component analysis using reversible jump Markov Chain Monte Carlo. In Proceedings of the 19th National Conference on Artificial Intelligence, San Jose, California, USA 372-377. AAAI Press.
[17] Zou, H., Hastie, T. and Tibshirani, R. (2004). Sparse principal component analysis. J. Comput. Graph. Statist. 15 2006.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.