×

Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications. (English) Zbl 1462.62324

Summary: This article is an extended version of previous work of the authors [“Phase transitions in sparse PCA”, Preprint, arXiv:1503.00338; “MMSE of probabilistic low-rank matrix estimation: universality with respect to the output channel”, Preprint, arXiv:1507.03857] on low-rank matrix estimation in the presence of constraints on the factors into which the matrix is factorized. Low-rank matrix factorization is one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a parallel with the study of vector-spin glass models – presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail a general form of the low-rank approximate message passing (Low-RAMP) algorithm, that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods.

MSC:

62H12 Estimation in multivariate analysis
62H25 Factor analysis and principal components; correspondence analysis
62F15 Bayesian inference
62P35 Applications of statistics to physics
82C26 Dynamic and nonequilibrium phase transitions (general) in statistical mechanics
82D30 Statistical mechanics of random media, disordered materials (including liquid crystals and spin glasses)

Software:

darch; ElemStatLearn
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Lesieur T, Krzakala F and Zdeborová L 2015 Phase transitions in sparse PCA IEEE Int. Symp. on Information Theory Proc. pp 1635-9 · doi:10.1109/ISIT.2015.7282733
[2] Lesieur T, Krzakala F and Zdeborová L 2015 MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel 53rd Annual Allerton Conf. on Communication, Control and Computing pp 680-7 · doi:109/ALLERTON.2015.7447070
[3] Gabrié M, Tramel E W and Krzakala F 2015 Training restricted Boltzmann machine via the Thouless-Anderson-Palmer free energy Advances in Neural Information Processing Systems pp 640-8
[4] Tramel E W, Manoel A, Caltagirone F, Gabrié M and Krzakala F 2016 Inferring sparsity: Compressed sensing using generalized restricted Boltzmann machines IEEE Information Theory Workshop pp 265-9 · doi:10.1109/ITW.2016.7606837
[5] Nishimori H 2001 Statistical Physics of Spin Glasses, Information Processing: an Introduction (Oxford: Oxford University Press) · Zbl 1103.82002 · doi:10.1093/acprof:oso/9780198509417.001.0001
[6] Zdeborová L and Krzakala F 2016 Statistical physics of inference: thresholds and algorithms Adv. Phys.65 453-552 · doi:10.1080/00018732.2016.1211393
[7] Nishimori H and Sherrington D 2001 Absence of Replica Symmetry Breaking in a Region of the Phase Diagram of the Ising Spin Glass(American Institute of Physics Conf. Series vol 553) (New York: American Institute of Physics) pp 67-72 · Zbl 1046.82035 · doi:10.1063/1.1358165
[8] Eckart C and Young G 1936 The approximation of one matrix by another of lower rank Psychometrika1 211-8 · JFM 62.1075.02 · doi:10.1007/BF02288367
[9] Baik J, Ben Arous G and Péché S 2005 Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices Ann. Probab.33 1643-97 · Zbl 1086.15022 · doi:10.1214/009117905000000233
[10] Sherrington D and Kirkpatrick S 1975 Solvable model of a spin-glass Phys. Rev. Lett.35 1792-6 · doi:10.1103/PhysRevLett.35.1792
[11] Thouless D J, Anderson P W and Palmer R G 1977 Solution of ’solvable model of a spin glass’ Phil. Mag.35 593-601 · doi:10.1080/14786437708235992
[12] Kosterlitz J, Thouless D and Jones R C 1976 Spherical model of a spin-glass Phys. Rev. Lett.36 1217 · doi:10.1103/PhysRevLett.36.1217
[13] Sommers H-J 1981 Theory of a Heisenberg spin glass J. Magn. Magn. Mater.22 267-70 · doi:10.1016/0304-8853(81)90032-9
[14] Gabay M and Toulouse G 1981 Coexistence of spin-glass and ferromagnetic orderings Phys. Rev. Lett.47 201 · doi:10.1103/PhysRevLett.47.201
[15] Hinton G 2010 A practical guide to training restricted Boltzmann machines Momentum9 926
[16] Hinton G E, Osindero S and Teh Y-W 2006 A fast learning algorithm for deep belief nets Neural Comput.18 1527-54 · Zbl 1106.68094 · doi:10.1162/neco.2006.18.7.1527
[17] LeCun Y, Bengio Y and Hinton G 2015 Deep learning Nature521 436-44 · doi:10.1038/nature14539
[18] Kappen H J and Rodriguez F 1998 Boltzmann machine learning using mean field theory and linear response correction Adv. Neural Inf. Process. Syst. 280-6
[19] Hopfield J J 1982 Neural networks and physical systems with emergent collective computational abilities Proc. Natl Acad. Sci.79 2554-8 · Zbl 1369.92007
[20] Mézard M 2017 Mean-field message-passing equations in the hopfield model and its generalizations Phys. Rev. E 95 022117 · doi:10.1103/PhysRevE.95.022117
[21] Hastie T, Tibshirani R, Friedman J and Franklin J 2005 The elements of statistical learning: data mining, inference and prediction Math. Intell.27 83-5 · doi:10.1007/BF02985802
[22] Wasserman L 2013 All of Statistics: a Concise Course in Statistical Inference (Berlin: Springer)
[23] Lloyd S 1982 Least squares quantization in PCM IEEE Trans. Inf. Theory28 129-37 · Zbl 0504.94015 · doi:10.1109/TIT.1982.1056489
[24] Watkin T and Nadal J-P 1994 Optimal unsupervised learning J. Phys. A: Math. Gen.27 1899 · Zbl 0840.68101 · doi:10.1088/0305-4470/27/6/016
[25] Barkai N and Sompolinsky H 1994 Statistical mechanics of the maximum-likelihood density estimation Phys. Rev. E 50 1766 · doi:10.1103/PhysRevE.50.1766
[26] Biehl M and Mietzner A 1994 Statistical mechanics of unsupervised structure recognition J. Phys. A: Math. Gen.27 1885 · Zbl 0840.68098 · doi:10.1088/0305-4470/27/6/015
[27] Matsushita R and Tanaka T 2013 Low-rank matrix reconstruction and clustering via approximate message passing (Advances in Neural Information Processing Systems vol 26) ed C Burges et al (Cambridge, MA: Curran Associates, Inc.) pp 917-25 http://papers.nips.cc/paper/5074-low-rank-matrix-reconstruction-and-clustering-via-approximate-message-passing.pdf
[28] Parsons L, Haque E and Liu H 2004 Subspace clustering for high dimensional data: a review ACM SIGKDD Explorations Newsl.6 90-105 · doi:10.1145/1007730.1007731
[29] Lesieur T, De Bacco C, Banks J, Krzakala F, Moore C and Zdeborová L 2016 Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering 54th Annual Allerton Conf. on Communication, Control and Computing pp 601-8 · doi:10.1109/ALLERTON.2016.7852287
[30] Johnstone I M and Lu A Y Sparse principal components analysis unpublished manuscript (arXiv: 0901.4392)
[31] Zou H, Hastie T and Tibshirani R 2006 Sparse principal component analysis J. Comput. Graph. Stat.15 265-86 · doi:10.1198/106186006X113430
[32] Amini A A and Wainwright M J 2009 High-dimensional analysis of semidefinite relaxations for sparse principal components Ann. Stat.37 2877-2921 · Zbl 1173.62049 · doi:10.1214/08-AOS664
[33] Deshpande Y and Montanari A 2014 Sparse PCA via covariance thresholding Advances in Neural Information Processing Systems pp 334-42
[34] Krauthgamer R et al 2015 Do semidefinite relaxations solve sparse PCA up to the information limit? Ann. Stat.43 1300-22 · Zbl 1320.62138 · doi:10.1214/15-AOS1310
[35] Berthet Q and Rigollet P 2013 Complexity theoretic lower bounds for sparse principal component detection COLT pp 1046-66
[36] Deshpande Y and Montanari A 2014 Information-theoretically optimal sparse PCA IEEE Int. Symp. on Information Theory pp 2197-201
[37] Monasson R and Villamaina D 2015 Estimating the principal components of correlation matrices from all their empirical eigenvectors Europhys. Lett.112 50001 · doi:10.1209/0295-5075/112/50001
[38] Madeira S C and Oliveira A L 2004 Biclustering algorithms for biological data analysis: a survey IEEE/ACM Trans. Comput. Biol. Bioinf.1 24-45 · doi:10.1109/TCBB.2004.2
[39] Cheng Y and Church G M 2000 Biclustering of expression data Ismbvol 8 pp 93-103
[40] Fortunato S 2010 Community detection in graphs Phys. Rep.486 75-174 · doi:10.1016/j.physrep.2009.11.002
[41] Decelle A, Krzakala F, Moore C and Zdeborová L 2011 Inference and phase transitions in the detection of modules in sparse networks Phys. Rev. Lett.107 065701 · doi:10.1103/PhysRevLett.107.065701
[42] Decelle A, Krzakala F, Moore C and Zdeborová L 2011 Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications Phys. Rev. E 84 066106 · doi:10.1103/PhysRevE.84.066106
[43] Gross D J, Kanter I and Sompolinsky H 1985 Mean-field theory of the potts glass Phys. Rev. Lett.55 304-7 · doi:10.1103/PhysRevLett.55.304
[44] Deshpande Y, Abbe E and Montanari A 2016 Asymptotic mutual information for the binary stochastic block model IEEE Int. Symp. on Information Theory pp 185-9
[45] Barbier J, Dia M, Macris N, Krzakala F, Lesieur T and Zdeborov’a L 2016 Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula Advances In Neural Information Processing Systems pp 424-32
[46] Caltagirone F, Lelarge M and Miolane L 2016 Recovering asymmetric communities in the stochastic block model (arXiv:1610.03680)
[47] Montanari A 2015 Finding one community in a sparse graph J. Stat. Phys.161 273-99 · Zbl 1327.82091 · doi:10.1007/s10955-015-1338-2
[48] Tubiana J and Monasson R 2017 Emergence of compositional representations in restricted boltzmann machines Phys. Rev. Lett.118 138301 · doi:10.1103/PhysRevLett.118.138301
[49] Krzakala F, Mézard M and Zdeborová L 2013 Phase diagram and approximate message passing for blind calibration and dictionary learning IEEE Int. Symp. on Information Theory Proceedings pp 659-63
[50] Parker J T, Schniter P and Cevher V 2014 Bilinear generalized approximate message passing—part I: derivation IEEE Trans. Signal Process.62 5839-53 · Zbl 1394.94447 · doi:10.1109/TSP.2014.2357776
[51] Kabashima Y, Krzakala F, Mézard M, Sakata A and Zdeborová L 2016 Phase transitions and sample complexity in bayes-optimal matrix factorization IEEE Trans. Inf. Theory62 4228-65 · Zbl 1359.94109 · doi:10.1109/TIT.2016.2556702
[52] Rangan S and Fletcher A K 2012 Iterative estimation of constrained rank-one matrices in noise IEEE Int. Symp. on Information Theory Proceedings pp 1246-50
[53] Javanmard A and Montanari A 2013 State evolution for general approximate message passing algorithms, with applications to spatial coupling Inf. Inference2 115 · Zbl 1335.94015 · doi:10.1093/imaiai/iat004
[54] Krzakala F, Xu J and Zdeborová L 2016 Mutual information in rank-one matrix estimation Information Theory Workshop pp 71-5
[55] Marc Lelarge L M 2016 Fundamental limits of symmetric low-rank matrix estimation (arXiv:1611.03888 [math.PR])
[56] Miolane L 2017 Fundamental limits of low-rank matrix estimation (arXiv:1702.00473)
[57] Mézard M, Parisi G and Virasoro M A 1987 Spin-Glass Theory and Beyond(Lecture Notes in Physics vol 9) (Singapore: World Scientific) · Zbl 0992.82500
[58] Perry A, Wein A S, Bandeira A S and Moitra A 2016 Message-passing algorithms for synchronization problems over compact groups (arXiv:1610.04583)
[59] Krzakala F, Manoel A, Tramel E W and Zdeborová L 2014 Variational free energies for compressed sensing IEEE Int. Symp. on Information Theory pp 1499-503
[60] Rangan S, Schniter P, Riegler E, Fletcher A and Cevher V 2013 Fixed points of generalized approximate message passing with arbitrary matrices IEEE Int. Symp. on Information Theory Proc. pp 664-8
[61] Vila J, Schniter P, Rangan S, Krzakala F and Zdeborová L 2015 Adaptive damping and mean removal for the generalized approximate message passing algorithm IEEE Int. Conf. on Acoustics, Speech and Signal Processing pp 2021-5
[62] Perry A, Wein A S, Bandeira A S and Moitra A 2016 Optimality and sub-optimality of PCA for spiked random matrices and synchronization (arXiv:1609.05573)
[63] Banks J, Moore C, Vershynin R and Xu J 2016 Information-theoretic bounds and phase transitions in clustering, sparse PCA and submatrix localization (arXiv:1607.05222)
[64] Yedidia J S, Freeman W T and Weiss Y 2003 Understanding belief propagation and its generalizations Exploring Artificial Intelligence in the New Millennium (San Francisco, CA: Morgan Kaufmann Publishers Inc.) pp 239-69 http://dl.acm.org/citation.cfm?id=779343.779352
[65] Montanari A and Semerjian G 2006 Rigorous inequalities between length and time scales in glassy systems J. Stat. Phys.125 23-54 · Zbl 1112.82051 · doi:10.1007/s10955-006-9175-y
[66] Mézard M, Parisi G and Virasoro M 1986 Sk model: the replica solution without replicas Europhys. Lett.1 77 · doi:10.1209/0295-5075/1/2/006
[67] Plefka T 1982 Convergence condition of the tap equation for the infinite-ranged ising spin glass model J. Phys. A: Math. Gen.15 1971 · doi:10.1088/0305-4470/15/6/035
[68] Rangan S 2010 Estimation with random linear mixing, belief propagation and compressed sensing 44th Annual Conf. on Information Sciences and Systems pp 1-6
[69] Donoho D L, Maleki A and Montanari A 2009 Message-passing algorithms for compressed sensing Proc. Natl Acad. Sci.106 18914-9
[70] Bayati M and Montanari A 2011 The dynamics of message passing on dense graphs, with applications to compressed sensing IEEE Trans. Inf. Theory57 764-85 · Zbl 1366.94079 · doi:10.1109/TIT.2010.2094817
[71] Caltagirone F, Zdeborová L and Krzakala F 2014 On convergence of approximate message passing IEEE Int. Symp. on Information Theory pp 1812-6
[72] Georges A and Yedidia J S 1991 How to expand around mean-field theory using high-temperature expansions J. Phys. A: Math. Gen.24 2173 · doi:10.1088/0305-4470/24/9/024
[73] Deshpande Y and Montanari A 2015 Finding hidden cliques of size N/e in nearly linear time Found. Comput. Math.15 1-60 · Zbl 1347.05227 · doi:10.1007/s10208-014-9215-y
[74] Krzakala F, Moore C, Mossel E, Neeman J, Sly A, Zdeborová L and Zhang P 2013 Spectral redemption in clustering sparse networks Proc. Natl Acad. Sci.110 20935-40 · Zbl 1359.62252
[75] Hoyle D C and Rattray M 2004 Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure Phys. Rev. E 69 026124 · doi:10.1103/PhysRevE.69.026124
[76] Richard E and Montanari A 2014 A statistical model for tensor PCA Advances in Neural Information Processing Systems pp 2897-905
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.