×

1-bit matrix completion: PAC-Bayesian analysis of a variational approximation. (English) Zbl 1461.15032

Summary: We focus on the completion of a (possibly) low-rank matrix with binary entries, the so-called 1-bit matrix completion problem. Our approach relies on tools from machine learning theory: empirical risk minimization and its convex relaxations. We propose an algorithm to compute a variational approximation of the pseudo-posterior. Thanks to the convex relaxation, the corresponding minimization problem is bi-convex, and thus the method works well in practice. We study the performance of this variational approximation through PAC-Bayesian learning bounds. Contrary to previous works that focused on upper bounds on the estimation error of \(M\) with various matrix norms, we are able to derive from this analysis a PAC bound on the prediction error of our algorithm. We focus essentially on convex relaxation through the hinge loss, for which we present a complete analysis, a complete simulation study and a test on the MovieLens data set. We also discuss a variational approximation to deal with the logistic loss.

MSC:

15A83 Matrix completion problems
65F55 Numerical methods for low-rank matrix approximation; matrix compression
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Alquier, P., Cottet, V., & Lecué, G. (2017). Estimation bounds and sharp oracle inequalities of regularized procedures with Lipschitz loss functions. arXiv preprint arXiv:1702.01402.
[2] Alquier, P., Ridgway, J., & Chopin, N. (June 2015). On the properties of variational approximations of Gibbs posteriors. arXiv e-prints. · Zbl 1437.62129
[3] Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer. · Zbl 1107.68072
[4] Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford: OUP. · Zbl 1279.60005 · doi:10.1093/acprof:oso/9780199535255.001.0001
[5] Boyd, S; Parikh, N; Chu, E; Peleato, B; Eckstein, J, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3, 1-122, (2011) · Zbl 1229.90122 · doi:10.1561/2200000016
[6] Cai, T; Zhou, W-X, A MAX-norm constrained minimization approach to 1-bit matrix completion, Journal of Machine Learning Research, 14, 3619-3647, (2013) · Zbl 1318.62172
[7] Candès, EJ; Plan, Y, Matrix completion with noise, Proceedings of the IEEE, 98, 925-936, (2010) · doi:10.1109/JPROC.2009.2035722
[8] Candès, EJ; Recht, B, Exact matrix completion via convex optimization, Communications of the ACM, 55, 111-119, (2012) · Zbl 1219.90124 · doi:10.1145/2184319.2184343
[9] Candès, EJ; Tao, T, The power of convex relaxation: near-optimal matrix completion, IEEE Transactions on Information Theory, 56, 2053-2080, (2010) · Zbl 1366.15021 · doi:10.1109/TIT.2010.2044061
[10] Catoni, O; Picard, J (ed.), Statistical learning theory and stochastic optimization, (2004), Berlin
[11] Catoni, O. (2007). PAC-Bayesian supervised classification: The thermodynamics of statistical learning (Vol. 56)., Institute of mathematical statistics lecture notes—Monograph series Beachwood, OH: Institute of Mathematical Statistics. · Zbl 1277.62015
[12] Chatterjee, S, Matrix estimation by universal singular value thresholding, Annals of Statistics, 43, 177-214, (2015) · Zbl 1308.62038 · doi:10.1214/14-AOS1272
[13] Dalalyan, A; Tsybakov, AB, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Machine Learning, 72, 39-61, (2008) · doi:10.1007/s10994-008-5051-0
[14] Davenport, MA; Plan, Y; Berg, E; Wootters, M, 1-bit matrix completion, Information and Inference, 3, 189-223, (2014) · Zbl 1309.62124 · doi:10.1093/imaiai/iau006
[15] Herbrich, R; Graepel, T, A PAC-Bayesian margin bound for linear classifiers, IEEE Transactions on Information Theory, 48, 3140-3150, (2002) · Zbl 1063.62092 · doi:10.1109/TIT.2002.805090
[16] Herbster, M., Pasteris, S., & Pontil, M. (2016). Mistake bounds for binary matrix completion. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett, & R. Garnett (Eds.), Proceedings of the 29th conference on neural information processing systems (NIPS 2016). Barcelona, Spain: NIPS Proceedings.
[17] Hsieh, C.-J., Natarajan, N., & Dhillon, I. S. (2015). PU learning for matrix completion. In Proceedings of the 32nd international conference on machine learning, pp. 2445-2453. · Zbl 1317.62050
[18] Jaakkola, TS; Jordan, MI, Bayesian parameter estimation via variational methods, Statistics and Computing, 10, 25-37, (2000) · doi:10.1023/A:1008932416310
[19] Klopp, O; Lafond, J; Moulines, É; Salmon, J, Adaptive multinomial matrix completion, Electronic Journal of Statistics, 9, 2950-2975, (2015) · Zbl 1329.62304 · doi:10.1214/15-EJS1093
[20] Kyung, M; Gill, J; Ghosh, M; Casella, G, Penalized regression, standard errors, and Bayesian lassos, Bayesian Analysis, 5, 369-412, (2010) · Zbl 1330.62289 · doi:10.1214/10-BA607
[21] Latouche, P., Robin, S., & Ouadah, S. (2015). Goodness of fit of logistic models for random graphs. arXiv preprint arXiv:1508.00286. · Zbl 1318.62172
[22] Lim, Y. J. & Teh, Y. W. (2007). Variational Bayesian approach to movie rating prediction. In Proceedings of KDD cup and workshop. · Zbl 1329.62304
[23] Mai, TT; Alquier, P, A Bayesian approach for noisy matrix completion: optimal rate under general sampling distribution, Electronic Journal of Statistics, 9, 823-841, (2015) · Zbl 1317.62050 · doi:10.1214/15-EJS1020
[24] Mammen, E; Tsybakov, A, Smooth discrimination analysis, The Annals of Statistics, 27, 1808-1829, (1999) · Zbl 0961.62058 · doi:10.1214/aos/1017939240
[25] Mazumder, R; Hastie, T; Tibshirani, R, Spectral regularization algorithms for learning large incomplete matrices, Journal of Machine Learning Research, 11, 2287-2322, (2010) · Zbl 1242.68237
[26] McAllester, D. A. (1998). Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on computational learning theory (pp. 230-234). New York, ACM. · Zbl 1229.90122
[27] Park, T; Casella, G, The Bayesian lasso, Journal of the American Statistical Association, 103, 681-686, (2008) · Zbl 1330.62292 · doi:10.1198/016214508000000337
[28] Recht, B; Ré, C, Parallel stochastic gradient algorithms for large-scale matrix completion, Mathematical Programming Computation, 5, 201-226, (2013) · Zbl 1275.90039 · doi:10.1007/s12532-013-0053-8
[29] Salakhutdinov, R. & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on machine learning, pp. 880-887. · Zbl 1308.62038
[30] Seldin, Y; Tishby, N, PAC-Bayesian analysis of co-clustering and beyond, Journal of Machine Learning Research, 11, 3595-3646, (2010) · Zbl 1242.62060
[31] Seldin, Y; Laviolette, F; Cesa-Bianchi, N; Shawe-Taylor, J; Auer, P, PAC-Bayesian inequalities for martingales, IEEE Transactions on Information Theory, 58, 7086-7093, (2012) · Zbl 1364.60030 · doi:10.1109/TIT.2012.2211334
[32] Shawe-Taylor, J; Langford, J, PAC-Bayes and margins, Advances in Neural Information Processing Systems, 15, 439, (2003)
[33] Srebro, N., Rennie, J., & Jaakkola, T. S. (2004). Maximum-margin matrix factorization. In Advances in neural information processing systems, pp. 1329-1336.
[34] Vapnik, V. (1998). Statistical learning theory. New York: Wiley. · Zbl 0935.62007
[35] Zhang, T, Statistical behavior and consistency of classification methods based on convex risk minimization, Annals of Statistics, 32, 56-85, (2004) · Zbl 1105.62323 · doi:10.1214/aos/1079120130
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.