Exact matrix completion via convex optimization.

*(English)*Zbl 1219.90124The authors are concerned with recovering the data matrix from a sampling of its elements by solving a convex optimization problem. They first consider the case of low-rank matrices and prove that if the number \(m\) of sampled entries obeys \(m \geq Cn^{1.2}r \log n\) for some positive numerical constant \(C\), then with very high probability, most \(n\times n\) matrices of rank \(r\) can be perfectly recovered by solving a simple convex optimization problem. The authors show that the semidefinite program finds the matrix with minimum nuclear norm that fits the data.

Replacing the exponent \(1.2\) by \(1.25\) gives that the result holds for all values of the rank.

This interesting survey includes recent results on coherence and incoherence of linear subspaces of \(n\)-dimensional real space and on incoherence of unitary transformations by E. J. Candes, J. Romberg [Inverse Probl. 23, No. 3, 969–985 (2007; Zbl 1120.94005)], on properties of sampling operator and on random orthogonal model. For a general discussion of approaches to convex optimization as well as specific technical results in the field, see the book by N. Shor [Nondifferentiable Optimization and Polynomial Problems. Nonconvex Optimization and Its Applications. 24. Dordrecht: Kluwer Academic Publishers. (1998; Zbl 0901.49015)].

For a subspace \(U\) of dimension \(r\) of the \(n\)-dimensional real space and the orthogonal projection \({\mathcal P}_U\) onto \(U\), the authors of the paper define the coherence of \(U\) (vis-à-vis the standard basis \(({\mathbf e}_i)\)) as \[ \mu(U) \equiv \frac{n}{r} \max \| {\mathcal P}_U {\mathbf e}_i \|^2 . \] To state their main result, they introduce two assumptions about an \(n_1 \times n_2\) matrix \({\mathcal M}\) whose singular value decomposition is given by \({\mathcal M} = \sum_{1\leq k \leq r}\sigma_{k}{\mathbf u}_{k}{\mathbf v}^{*}_{k} \) and with column and row spaces denoted by \(U\) and \(V,\) respectively:

(A0) The coherence obey \(\max(\mu(U), \mu(V)) \leq \mu_0\) for some positive \(\mu_0\).

(A1) The \(n_1 \times n_2\) matrix \(\sum_{1\leq k \leq r}{\mathbf u}_{k}{\mathbf v}^{*}_{k} \) has a maximal entry bounded by \(\mu_1 \sqrt{r/(n_1 n_2)}\) in absolute value for some positive \(\mu_1\).

Under these assumptions the main result of the paper asserts that if a matrix has row and column spaces that are incoherent with the standard basis, then the nuclear norm minimization can recover this matrix from a random sampling of a small number of entries \({\mathcal M}_{i j} \; (i,j) \in \Omega.\)

The paper has eight sections: 1 Introduction. 2 Which Matrices Are Incoherent? 3 Duality. 4 Architecture of the Proof. 5 Connections with Random Graph Theory. 6 Proofs of the Critical Lemmas. 7 Numerical Experiments. 8 Discussion.

The first one treats the problem of matrix completion via constraint minimization and justifies the author’s choice of considering an alternative which minimizes the sum of nonvanishing singular values of the matrix \(X\) (the matrix nuclear norm \(\| X \|_{*}\)) over the constraint set (problem (1.5)): \[ \text{minimize } \| X \|_{*}\text{ subject to } X_{i j} = {\mathcal M}_{i j} \; (i,j) \in \Omega. \] The random orthogonal model and, more generally, matrices with incoherent column and row spaces are presented in Section 2. In Section 3 the authors establish sufficient conditions which guarantee that the true low-rank matrix \({\mathcal M}\) is the unique solution to problem (1.5). One of these conditions is the existence of a dual vector obeying two crucial properties.

“Section 4 constructs such a dual vector and provides the overall architecture of the proof which shows that, indeed, this vector obeys the desired properties provided that the number of measurements is sufficiently large. Surprisingly, as explored in Sect. 5, the existence of a dual vector certifying that \({\mathcal M}\) is unique is related to some problems in random graph theory including ‘the coupon collector’s problem’. Following this discussion, we prove our main result via several intermediate results which are all proven in Sect. 6. Section 7 introduces numerical experiments showing that matrix completion based on nuclear norm minimization works well in practice. Section 8 closes the paper with a short summary of our findings, a discussion of important extensions and improvements.”

The appendix contains proofs of the fact (Theorem 4.2 from Section 4) that the operator \({\mathcal P}_{T} {\mathcal P}_{\Omega} {\mathcal P}_{T}\) (the product of projectors \({\mathcal P}_{T}\) onto the linear space \(T\) and the orthogonal projector \({\mathcal P}_{\Omega}\) onto the indices in the set \(\Omega\) of locations corresponding to the observed entries \({\mathcal M}_{i j}\)) does not deviate from its expected value in the part that follows from the concentration inequality of M. Talagrand [Invent. Math. 126(3) 505–563 (1996; Zbl 0893.60001)], and the result on bounds of \(q\)th moment that is connected with the Schatten \(q\)-norm of a Rademacher series (Lemma 6.2 from Section 6). The proof of Lemma 6.2 is based on the noncommutative Khintchine inequality (Lemma 6.1 from Section 6) of F. Lust-Piquard [C. R. Acad. Sci., Paris, Sér. I 303, 289–292 (1986; Zbl 0592.47038)] and A. Buchholz [Math. Ann. 319, 1–16 (2001; Zbl 0991.46035)].

Some of the results and estimates used in proving the main theorem are of independent interest.

There is a bibliography of about forty items.

Replacing the exponent \(1.2\) by \(1.25\) gives that the result holds for all values of the rank.

This interesting survey includes recent results on coherence and incoherence of linear subspaces of \(n\)-dimensional real space and on incoherence of unitary transformations by E. J. Candes, J. Romberg [Inverse Probl. 23, No. 3, 969–985 (2007; Zbl 1120.94005)], on properties of sampling operator and on random orthogonal model. For a general discussion of approaches to convex optimization as well as specific technical results in the field, see the book by N. Shor [Nondifferentiable Optimization and Polynomial Problems. Nonconvex Optimization and Its Applications. 24. Dordrecht: Kluwer Academic Publishers. (1998; Zbl 0901.49015)].

For a subspace \(U\) of dimension \(r\) of the \(n\)-dimensional real space and the orthogonal projection \({\mathcal P}_U\) onto \(U\), the authors of the paper define the coherence of \(U\) (vis-à-vis the standard basis \(({\mathbf e}_i)\)) as \[ \mu(U) \equiv \frac{n}{r} \max \| {\mathcal P}_U {\mathbf e}_i \|^2 . \] To state their main result, they introduce two assumptions about an \(n_1 \times n_2\) matrix \({\mathcal M}\) whose singular value decomposition is given by \({\mathcal M} = \sum_{1\leq k \leq r}\sigma_{k}{\mathbf u}_{k}{\mathbf v}^{*}_{k} \) and with column and row spaces denoted by \(U\) and \(V,\) respectively:

(A0) The coherence obey \(\max(\mu(U), \mu(V)) \leq \mu_0\) for some positive \(\mu_0\).

(A1) The \(n_1 \times n_2\) matrix \(\sum_{1\leq k \leq r}{\mathbf u}_{k}{\mathbf v}^{*}_{k} \) has a maximal entry bounded by \(\mu_1 \sqrt{r/(n_1 n_2)}\) in absolute value for some positive \(\mu_1\).

Under these assumptions the main result of the paper asserts that if a matrix has row and column spaces that are incoherent with the standard basis, then the nuclear norm minimization can recover this matrix from a random sampling of a small number of entries \({\mathcal M}_{i j} \; (i,j) \in \Omega.\)

The paper has eight sections: 1 Introduction. 2 Which Matrices Are Incoherent? 3 Duality. 4 Architecture of the Proof. 5 Connections with Random Graph Theory. 6 Proofs of the Critical Lemmas. 7 Numerical Experiments. 8 Discussion.

The first one treats the problem of matrix completion via constraint minimization and justifies the author’s choice of considering an alternative which minimizes the sum of nonvanishing singular values of the matrix \(X\) (the matrix nuclear norm \(\| X \|_{*}\)) over the constraint set (problem (1.5)): \[ \text{minimize } \| X \|_{*}\text{ subject to } X_{i j} = {\mathcal M}_{i j} \; (i,j) \in \Omega. \] The random orthogonal model and, more generally, matrices with incoherent column and row spaces are presented in Section 2. In Section 3 the authors establish sufficient conditions which guarantee that the true low-rank matrix \({\mathcal M}\) is the unique solution to problem (1.5). One of these conditions is the existence of a dual vector obeying two crucial properties.

“Section 4 constructs such a dual vector and provides the overall architecture of the proof which shows that, indeed, this vector obeys the desired properties provided that the number of measurements is sufficiently large. Surprisingly, as explored in Sect. 5, the existence of a dual vector certifying that \({\mathcal M}\) is unique is related to some problems in random graph theory including ‘the coupon collector’s problem’. Following this discussion, we prove our main result via several intermediate results which are all proven in Sect. 6. Section 7 introduces numerical experiments showing that matrix completion based on nuclear norm minimization works well in practice. Section 8 closes the paper with a short summary of our findings, a discussion of important extensions and improvements.”

The appendix contains proofs of the fact (Theorem 4.2 from Section 4) that the operator \({\mathcal P}_{T} {\mathcal P}_{\Omega} {\mathcal P}_{T}\) (the product of projectors \({\mathcal P}_{T}\) onto the linear space \(T\) and the orthogonal projector \({\mathcal P}_{\Omega}\) onto the indices in the set \(\Omega\) of locations corresponding to the observed entries \({\mathcal M}_{i j}\)) does not deviate from its expected value in the part that follows from the concentration inequality of M. Talagrand [Invent. Math. 126(3) 505–563 (1996; Zbl 0893.60001)], and the result on bounds of \(q\)th moment that is connected with the Schatten \(q\)-norm of a Rademacher series (Lemma 6.2 from Section 6). The proof of Lemma 6.2 is based on the noncommutative Khintchine inequality (Lemma 6.1 from Section 6) of F. Lust-Piquard [C. R. Acad. Sci., Paris, Sér. I 303, 289–292 (1986; Zbl 0592.47038)] and A. Buchholz [Math. Ann. 319, 1–16 (2001; Zbl 0991.46035)].

Some of the results and estimates used in proving the main theorem are of independent interest.

There is a bibliography of about forty items.

Reviewer: Nikolaj M. Glazunov (Kyïv)

##### MSC:

90C25 | Convex programming |

90C59 | Approximation methods and heuristics in mathematical programming |

15B52 | Random matrices (algebraic aspects) |

90-02 | Research exposition (monographs, survey articles) pertaining to operations research and mathematical programming |

##### Keywords:

matrix completion; low-rank matrices; singular value decomposition; convex optimization; duality in optimization; nuclear norm minimization; random matrices; noncommutative Khintchine inequality; decoupling; compressed sensing
PDF
BibTeX
Cite

\textit{E. J. Candès} and \textit{B. Recht}, Found. Comput. Math. 9, No. 6, 717--772 (2009; Zbl 1219.90124)

Full Text:
DOI

##### References:

[1] | ACM SIGKDD, Netflix, Proceedings of KDD Cup and Workshop (2007). Proceedings available online at http://www.cs.uic.edu/\(\sim\)liub/KDD-cup-2007/proceedings.html . |

[2] | T. Ando, R.A. Horn, C.R. Johnson, The singular values of a Hadamard product: A basic inequality, Linear Multilinear Algebra 21, 345–365 (1987). · Zbl 0633.15011 |

[3] | Y. Azar, A. Fiat, A. Karlin, F. McSherry, J. Saia, Spectral analysis of data, in Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing (2001). · Zbl 1323.68426 |

[4] | C. Beck, R. D’Andrea, Computational study and comparisons of LFT reducibility methods, in Proceedings of the American Control Conference (1998). |

[5] | D.P. Bertsekas, A. Nedic, A.E. Ozdaglar, Convex Analysis and Optimization (Athena Scientific, Belmont, 2003). · Zbl 1140.90001 |

[6] | B. Bollobás, Random Graphs, 2nd edn. (Cambridge University Press, Cambridge, 2001). |

[7] | A. Buchholz, Operator Khintchine inequality in non-commutative probability, Math. Ann. 319, 1–16 (2001). · Zbl 0991.46035 |

[8] | J.-F. Cai, E.J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, Technical report (2008). Preprint available at http://arxiv.org/abs/0810.3286 . · Zbl 1201.90155 |

[9] | E.J. Candès, J. Romberg, Sparsity and incoherence in compressive sampling, Inverse Probl. 23(3), 969–985 (2007). · Zbl 1120.94005 |

[10] | E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52(2), 489–509 (2006). · Zbl 1231.94017 |

[11] | E.J. Candès, T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005). · Zbl 1264.94121 |

[12] | E.J. Candès, T. Tao, Near optimal signal recovery from random projections: Universal encoding strategies?, IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006). · Zbl 1309.94033 |

[13] | A.L. Chistov, D.Yu. Grigoriev, Complexity of quantifier elimination in the theory of algebraically closed fields, in Proceedings of the 11th Symposium on Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 176 (Springer, Berlin, 1984), pp. 17–31. |

[14] | V.H. de la Peña, Decoupling and Khintchine’s inequalities for U-statistics, Ann. Probab. 20(4), 1877–1892 (1992). · Zbl 0761.60014 |

[15] | V.H. de la Peña, S.J. Montgomery-Smith, Decoupling inequalities for the tail probabilities of multivariate U-statistics, Ann. Probab. 23(2), 806–816 (1995). · Zbl 0827.60014 |

[16] | D.L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). · Zbl 1288.94016 |

[17] | P. Drineas, M.W. Mahoney, S. Muthukrishnan, Subspace sampling and relative-error matrix approximation: Column-based methods, in Proceedings of the Tenth Annual RANDOM (2006). · Zbl 1155.68569 |

[18] | P. Drineas, M.W. Mahoney, S. Muthukrishnan, Subspace sampling and relative-error matrix approximation: Column-row-based methods, in Proceedings of the Fourteenth Annual ESA (2006). · Zbl 1131.68589 |

[19] | M. Fazel, Matrix rank minimization with applications, Ph.D. thesis, Stanford University (2002). |

[20] | R.A. Horn, C.R. Johnson, Topics in Matrix Analysis (Cambridge University Press, Cambridge, 1994). Corrected reprint of the 1991 original. · Zbl 0801.15001 |

[21] | T. Klein, E. Rio, Concentration around the mean for maxima of empirical processes, Ann. Probab. 33(3), 1060–1077 (2005). · Zbl 1066.60023 |

[22] | B. Laurent, P. Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Stat. 28(5), 1302–1338 (2000). · Zbl 1105.62328 |

[23] | M. Ledoux, The Concentration of Measure Phenomenon (AMS, Providence, 2001). · Zbl 0995.60002 |

[24] | A.S. Lewis, The mathematics of eigenvalue optimization, Math. Programm. 97(1–2), 155–176 (2003). · Zbl 1035.90085 |

[25] | N. Linial, E. London, Y. Rabinovich, The geometry of graphs and some of its algorithmic applications, Combinatorica 15, 215–245 (1995). · Zbl 0827.05021 |

[26] | F. Lust-Picquard, Inégalités de Khintchine dans C p (1<p<, C. R. Acad. Sci. Paris, Sér. I 303(7), 289–292 (1986). |

[27] | S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Technical report (2008). · Zbl 1221.65146 |

[28] | M. Mesbahi, G.P. Papavassilopoulos, On the rank minimization problem over a positive semidefinite linear matrix inequality, IEEE Trans. Automat. Control 42(2), 239–243 (1997). · Zbl 0872.93029 |

[29] | B. Recht, M. Fazel, P. Parrilo, Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization, SIAM Rev. (2007, submitted). Preprint available at http://arxiv.org/abs/0706.4138 . · Zbl 1198.90321 |

[30] | J.D.M. Rennie, N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, in Proceedings of the International Conference of Machine Learning (2005). |

[31] | M. Rudelson, Random vectors in the isotropic position, J. Funct. Anal. 164(1), 60–72 (1999). · Zbl 0929.46021 |

[32] | M. Rudelson, R. Vershynin, Sampling from large matrices: an approach through geometric functional analysis, J. ACM, 54(4), Art. 21, 19 pp. (electronic) (2007). · Zbl 1326.68333 |

[33] | A.M.-C. So, Y. Ye, Theory of semidefinite programming for sensor network localization, Math. Program., Ser. B, 109, 2007. · Zbl 1278.90482 |

[34] | N. Srebro, Learning with matrix factorizations, Ph.D. thesis, Massachusetts Institute of Technology, (2004). |

[35] | M. Talagrand, New concentration inequalities in product spaces, Invent. Math. 126(3), 505–563 (1996). · Zbl 0893.60001 |

[36] | K.C. Toh, M.J. Todd, R.H. Tütüncü, SDPT3–a MATLAB software package for semidefinite-quadratic-linear programming. Available from http://www.math.nus.edu.sg/\(\sim\)mattohkc/sdpt3.html . |

[37] | L. Vandenberghe, S.P. Boyd, Semidefinite programming, SIAM Rev. 38(1), 49–95 (1996). · Zbl 0845.65023 |

[38] | G.A. Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra Appl. 170, 33–45 (1992). · Zbl 0751.15011 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.