×

Bayesian mean-parameterized nonnegative binary matrix factorization. (English) Zbl 1460.62037

Summary: Binary data matrices can represent many types of data such as social networks, votes, or gene expression. In some cases, the analysis of binary matrices can be tackled with nonnegative matrix factorization (NMF), where the observed data matrix is approximated by the product of two smaller nonnegative matrices. In this context, probabilistic NMF assumes a generative model where the data is usually Bernoulli-distributed. Often, a link function is used to map the factorization to the \([0, 1]\) range, ensuring a valid Bernoulli mean parameter. However, link functions have the potential disadvantage to lead to uninterpretable models. Mean-parameterized NMF, on the contrary, overcomes this problem. We propose a unified framework for Bayesian mean-parameterized nonnegative binary matrix factorization models (NBMF). We analyze three models which correspond to three possible constraints that respect the mean-parameterization without the need for link functions. Furthermore, we derive a novel collapsed Gibbs sampler and a collapsed variational algorithm to infer the posterior distribution of the factors. Next, we extend the proposed models to a nonparametric setting where the number of used latent dimensions is automatically driven by the observed data. We analyze the performance of our NBMF methods in multiple datasets for different tasks such as dictionary learning and prediction of missing data. Experiments show that our methods provide similar or superior results than the state of the art, while automatically detecting the number of relevant components.

MSC:

62F15 Bayesian inference
15A23 Factorization of matrices
60B20 Random matrices (probabilistic aspects)

Software:

Pyglrm; LowRankModels; R
PDFBibTeX XMLCite
Full Text: DOI arXiv HAL

References:

[1] Aldous DJ (1985) Exchangeability and related topics. École d’été de probabilités de Saint-Flour, XIII-1983, vol 1117. Lecture Notes in Mathematics. Springer, Berlin, pp 1-198 · Zbl 0562.60042
[2] Alquier, P.; Guedj, B., An oracle inequality for quasi-Bayesian nonnegative matrix factorization, Math Methods Stat, 26, 1, 55-67 (2017) · Zbl 1381.62222
[3] Anderson, JR, The adaptive nature of human categorization, Psychol Rev, 98, 409-429 (1991)
[4] Asuncion A, Welling M, Smyth P, Teh YW (2009) On smoothing and inference for topic models. In: Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI ’09), pp 27-34
[5] Bingham, E.; Kabán, A.; Fortelius, M., The aspect Bernoulli model: multiple causes of presences and absences, Pattern Anal Appl, 12, 1, 55-78 (2009) · Zbl 1422.68204
[6] Blei, DM; Ng, AY; Jordan, MI, Latent Dirichlet allocation, J Mach Learn Res, 3, Jan, 993-1022 (2003) · Zbl 1112.68379
[7] Buntine, WL; Jakulin, A., Discrete component analysis, Lect Notes Comput Sci Springer, 3940, 1-33 (2006)
[8] Canny J (2004) GaP: a factor model for discrete data. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04), pp 122-129
[9] Çapan G, Akbayrak S, Ceritli TY, Cemgil AT (2018) Sum conditioned Poisson factorization. In: Proceedings of the 14th international conference on latent variable analysis and signal separation (LVA/ICA ’18), pp 24-35
[10] Celma, O., Music recommendation and discovery in the long tail (2010), Berlin: Springer, Berlin
[11] Cemgil, AT, Bayesian inference for nonnegative matrix factorisation models, Comput Intell Neurosci, 2009, 1-17 (2009)
[12] Cichocki, A.; Lee, H.; Kim, YD; Choi, S., Non-negative matrix factorization with \(\alpha \)-divergence, Pattern Recognit Lett, 29, 9, 1433-1440 (2008)
[13] Collins M, Dasgupta S, Schapire RE (2002) A generalization of principal components analysis to the exponential family. In: Advances in neural information processing systems 14, MIT Press, pp 617-624
[14] Févotte, C.; Idier, J., Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence, Neural Comput, 23, 9, 2421-2456 (2011) · Zbl 1231.65072
[15] Gopalan, P.; Ruiz, FJR; Ranganath, R.; Blei, DM, Bayesian nonparametric Poisson factorization for recommendation systems, J Mach Learn Res, 33, 275-283 (2014)
[16] Gopalan P, Hofman JM, Blei DM (2015) Scalable recommendation with hierarchical Poisson factorization. In: Proceedings of the 31st conference on uncertainty in artificial intelligence (UAI ’15), pp 326-335
[17] He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web (WWW ’17), International World Wide Web Conferences Steering Committee, pp 173-182
[18] Hernandez-Lobato JM, Houlsby N, Ghahramani Z (2014) Stochastic inference for scalable probabilistic modeling of binary matrices. In: Proceedings of the 31st international conference on machine learning (ICML ’14) 32:1-6
[19] Hoffman MD, Blei DM, Cook PR (2010) Bayesian nonparametric matrix factorization for recorded music. In: Proceedings of the 27th international conference on international conference on machine learning (ICML ’10), pp 439-446
[20] Hoffman, MD; Blei, DM; Wang, C.; Paisley, J., Stochastic variational inference, J Mach Learn Res, 14, 1, 1303-1347 (2013) · Zbl 1317.68163
[21] Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the 15th conference on uncertainty in artificial intelligence (UAI ’99), pp 289-296
[22] Ishiguro, K.; Sato, I.; Ueda, N., Averaged collapsed variational Bayes inference, J Mach Learn Res, 18, 1, 1-29 (2017) · Zbl 1437.62125
[23] Kabán, A.; Bingham, E., Factorisation and denoising of 0-1 data: a variational approach, Neurocomputing, 71, 10-12, 2291-2308 (2008)
[24] Kemp C, Tenenbaum JB, Griffiths TL, Yamada T, Ueda N (2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the 21st national conference on artificial intelligence (AAAI ’06), pp 381-388
[25] Landgraf AJ, Lee Y (2015) Dimensionality reduction for binary data through the projection of natural parameters. Tech. Rep. 890, Department of Statistics, The Ohio State University · Zbl 1450.62069
[26] Larsen JS, Clemmensen LKH (2015) Non-negative matrix factorization for binary data. In: Proceedings of the 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K ’15), pp 555-563
[27] Lee, DD; Seung, HS, Learning the parts of objects with nonnegative matrix factorization, Nature, 401, 788-791 (1999) · Zbl 1369.68285
[28] Lee, DD; Seung, HS, Algorithms for non-negative matrix factorization, Adv Neural Inf Process Syst, 13, 556-562 (2001)
[29] Liu, JS, The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem, J Am Stat Assoc, 89, 427, 958-966 (1994) · Zbl 0804.62033
[30] Meeds E, Ghahramani Z, Neal RM, Roweis ST (2007) Modeling dyadic data with binary latent factors. In: Advances in neural information processing systems 19, MIT Press, pp 977-984
[31] Miettinen, P.; Mielikäinen, T.; Gionis, A.; Das, G.; Mannila, H., The discrete basis problem, IEEE Trans Knowl Data Eng, 20, 10, 1348-1362 (2008)
[32] NOW (2018) The NOW community. new and old worlds database of fossil mammals (NOW). http://www.helsinki.fi/science/now/, release 030717, retrieved May 2018
[33] Pitman J (2002) Combinatorial stochastic processes. Lecture notes in mathematics, Springer, Berlin, lectures from the 32nd Summer School on Probability Theory held in Saint-Flour · Zbl 1103.60004
[34] Polson, NG; Scott, JG; Windle, J., Bayesian inference for logistic models using Polya-Gamma latent variables, J Am Stat Assoc, 108, 1339-1349 (2013) · Zbl 1283.62055
[35] R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
[36] Rukat T, Holmes CC, Titsias MK, Yau C (2017) Bayesian Boolean matrix factorisation. In: Proceedings of the 34th international conference on machine learning (ICML ’17), pp 2969-2978
[37] Sammel, MD; Ryan, LM; Legler, JM, Latent variable models for mixed discrete and continuous outcomes, J R Stat Soc Ser B (Methodol), 59, 3, 667-678 (1997) · Zbl 0889.62043
[38] Schein AI, Saul LK, Ungar LH (2003) A generalized linear model for principal component analysis of binary data. In: Proceedings of the 9th workshop on artificial intelligence and statistics (AISTATS ’03), pp 14-21
[39] Schmidt MN, Winther O, Hansen LK (2009) Bayesian non-negative matrix factorization. In: Proceedings of the 8th international conference on independent component analysis and signal separation (ICA ’09), pp 540-547
[40] Singh AP, Gordon GJ (2008) A unified view of matrix factorization models. In: Proceedings of the European conference on machine learning and knowledge discovery in databases (ECML-PKDD ’08), pp 358-373
[41] Slawski M, Hein M, Lutsik P (2013) Matrix factorization with binary components. In: Advances in neural information processing systems 26, Curran Associates, Inc., pp 3210-3218
[42] Sørensen, T., A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons, Biologiske Skrifter, 5, 1-34 (1948)
[43] Steck H, Jaakkola TS (2003) On the Dirichlet prior and Bayesian regularization. In: Advances in neural information processing systems 15, MIT Press, pp 713-720
[44] Teh, YW; Jordan, MI; Beal, MJ; Blei, DM, Hierarchical Dirichlet processes, J Am Stat Assoc, 101, 476, 1566-1581 (2006) · Zbl 1171.62349
[45] Teh YW, Newman D, Welling M (2007) A collapsed variational Bayesian inference alzgorithm for latent Dirichlet allocation. In: Advances in neural information processing systems 19, MIT Press, pp 1353-1360
[46] Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Advances in neural information processing systems 11, MIT Press, pp 592-598
[47] Tipping, ME; Bishop, C., Probabilistic principal component analysis, J R Stat Soc Ser B, 21, 3, 611-622 (1999) · Zbl 0924.62068
[48] Tomé, AM; Schachtner, R.; Vigneron, V.; Puntonet, CG; Lang, EW, A logistic non-negative matrix factorization approach to binary data sets, Multidimens Syst Signal Process, 26, 1, 125-143 (2013)
[49] Udell, M.; Horn, C.; Zadeh, R.; Boyd, S., Generalized low rank models, Found Trends Mach Learn, 9, 1, 1-118 (2016) · Zbl 1350.68221
[50] Voeten, E.; Reinal, B., Data and analyses of voting in the UN General Assembly, Routledge handbook of international organization (2013), Abingdon: Routledge, Abingdon
[51] Xue HJ, Dai XY, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: Proceedings of the 26th international joint conference on artificial intelligence (IJCAI—17), AAAI Press, pp 3203-3209
[52] Zhang, ZY; Li, T.; Ding, C.; Ren, XW; Zhang, XS, Binary matrix factorization for analyzing gene expression data, Data Min Knowl Discov, 20, 1, 28-52 (2009)
[53] Zhou M (2015) Infinite edge partition models for overlapping community detection and link prediction. In: Proceedings of the 18th international conference on artificial intelligence and statistics (AISTATS ’15), pp 1135-1143
[54] Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and Poisson factor analysis. In: Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS ’12), pp 1462-1471
[55] Zhou, M.; Cong, Y.; Chen, B., Augmentable Gamma belief networks, J Mach Learn Res, 17, 163, 1-44 (2016) · Zbl 1392.68373
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.