×

A determinantal point process for column subset selection. (English) Zbl 07307477

Summary: Two popular approaches to dimensionality reduction are principal component analysis, which projects onto a small number of well-chosen but non-interpretable directions, and feature selection, which selects a small number of the original features. Feature selection can be abstracted as selecting the subset of columns of a matrix \(\mathbf{X} \in \mathbb{R}^{N \times d}\) which minimize the approximation error, i.e., the norm of the residual after projecting \(\mathbf{X}\) onto the space spanned by the selected columns. Such a combinatorial optimization is usually impractical, and there has been interest in polynomial-cost, random subset selection algorithms that favour small values of this approximation error. We propose sampling from a projection determinantal point process, a repulsive distribution over column indices that favours diversity among the selected columns. We bound the ratio of the expected approximation error over the optimal error of PCA. These bounds improve over the state-of-the-art bounds of volume sampling when some realistic structural assumptions are satisfied for \(\mathbf{X}\). Numerical experiments suggest that our bounds are tight, and that our algorithms have comparable performance with the double phase algorithm, often considered the practical state-of-the-art.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

DPPy; spatstat
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.Proceedings of the National Academy of Sciences, 96(12):6745-6750, 1999.
[2] H. Avron and C. Boutsidis. Faster subset selection for matrices and applications.SIAM Journal on Matrix Analysis and Applications, 34(4):1464-1499, 2013. · Zbl 1425.65059
[3] A. S. Bandeira, E. Dobriban, D. G. Mixon, and W. F. Sawin. Certifying the restricted isometry property is hard.IEEE transactions on information theory, 59(6):3448-3450, 2013. · Zbl 1364.94109
[4] R. Bardenet and A. Hardy. Monte Carlo with Determinantal Point Processes.Annals of applied probability (in press), 2019. · Zbl 1491.65007
[5] Y. Baryshnikov. GUEs and queues.Probability Theory and Related Fields, 119(2):256-274, 2001. · Zbl 0980.60042
[6] J. D. Batson, D. A. Spielman, and N. Srivastava. Twice-ramanujan sparsifiers. InProceedings of the Forty-first Annual ACM Symposium on Theory of Computing, STOC ’09, pages 255-262, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-506-2. doi: 10.1145/1536414.1536451. URLhttp://doi.acm.org/10.1145/1536414.1536451. · Zbl 1304.05130
[7] A. Ben-Israel.A volume associated with m x n matrices.Linear Algebra and its Applications, 167:87 - 111, 1992.ISSN 0024-3795.doi: http://dx.doi.org/10.1016/ 0024-3795(92)90340-G. URLhttp://www.sciencedirect.com/science/article/pii/ 002437959290340G.
[8] ˚
[9] A. Bj¨orck and G. H. Golub.Numerical methods for computing angles between linear subspaces.Mathematics of computation, 27(123):579-594, 1973. · Zbl 0282.65031
[10] C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. InProceedings of the Twentieth Annual ACMSIAM Symposium on Discrete Algorithms, SODA ’09, pages 968-977, Philadelphia, PA, USA, 2009. Society for Industrial and Applied Mathematics. URLhttp://dl.acm.org/ citation.cfm?id=1496770.1496875. · Zbl 1420.68235
[11] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near optimal column-based matrix reconstruction. InProceedings of the 2011 IEEE 52Nd Annual Symposium on Foundations of Computer Science, FOCS ’11, pages 305-314, Washington, DC, USA, 2011. IEEE Computer Society. ISBN 978-0-7695-4571-4. doi: 10.1109/FOCS.2011.21. URL http://dx.doi.org/10.1109/FOCS.2011.21. · Zbl 1292.65041
[12] M. A. Davenport and M. B. Wakin. Analysis of orthogonal matching pursuit using the restricted isometry property.IEEE Transactions on Information Theory, 56(9):4395- 4401, 2010. · Zbl 1366.94093
[13] G. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approximations.Constructive approximation, 13(1):57-98, 1997. · Zbl 0885.41006
[14] M. Derezinski and M. K. Warmuth. Unbiased estimates for linear regression via volume sampling. InAdvances in Neural Information Processing Systems, pages 3084-3093, 2017.
[15] M. Derezinski and M. K. Warmuth. Reverse iterative volume sampling for linear regression. The Journal of Machine Learning Research, 19(1):853-891, 2018. · Zbl 1445.62162
[16] M. Derezinski, M. K. Warmuth, and D. J. Hsu. Leveraged volume sampling for linear regression. InAdvances in Neural Information Processing Systems, pages 2505-2514, 2018.
[17] A. Deshpande and L. Rademacher.Efficient volume sampling for row/column subset selection.InProceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS ’10, pages 329-338, Washington, DC, USA, 2010. IEEE Computer Society. ISBN 978-0-7695-4244-7. doi: 10.1109/FOCS.2010.38. URL http://dx.doi.org/10.1109/FOCS.2010.38.
[18] A. Deshpande and S. Vempala.Adaptive sampling and fast low-rank matrix approximation.InProceedings of the 9th International Conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th International Conference on Randomization and Computation, APPROX’06/RANDOM’06, pages 292-303, Berlin, Heidelberg, 2006. Springer-Verlag. ISBN 3-540-38044-2, 978-3-540-38044-3. doi: 10.1007/11830924 28. URLhttp://dx.doi.org/10.1007/11830924_28. · Zbl 1155.68575
[19] A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. InProceedings of the Seventeenth Annual ACMSIAM Symposium on Discrete Algorithm, SODA ’06, pages 1117-1126, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics. ISBN 0-89871-605-5. URL http://dl.acm.org/citation.cfm?id=1109557.1109681. · Zbl 1192.68889
[20] I. Dhillon, R. Heath, M. Sustik, and J. Tropp. Generalized finite algorithms for constructing hermitian matrices with prescribed diagonal and spectrum.SIAM Journal on Matrix Analysis and Applications, 27(1):61-71, 2005. doi: 10.1137/S0895479803438183. URL https://doi.org/10.1137/S0895479803438183. · Zbl 1087.65038
[21] D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise.IEEE Transactions on information theory, 52 (1):6-18, 2005. · Zbl 1288.94017
[22] P. Drineas and I. C. F. Ipsen. Low-rank matrix approximations do not need a singular value gap.SIAM Journal on Matrix Analysis and Applications, 40(1):299-319, 2019. · Zbl 1455.65066
[23] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition.Mach. Learn., 56(1-3):9-33, June 2004. ISSN 08856125. doi: 10.1023/B:MACH.0000033113.59016.96. URLhttps://doi.org/10.1023/B: MACH.0000033113.59016.96. · Zbl 1089.68090
[24] P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions.SIAM Journal on Matrix Analysis and Applications, 30(2):844-881, 2008. · Zbl 1183.68738
[25] P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage.Journal of Machine Learning Research, 13 (Dec):3475-3506, 2012. · Zbl 1437.65030
[26] M. Fickus, D. G. Mixon, and M. J. Poteet. Frame completions for optimally robust reconstruction. InWavelets and Sparsity XIV, volume 8138, page 81380Q. International Society for Optics and Photonics, 2011.
[27] M. Fickus, D. G. Mixon, M. J. Poteet, and N. Strawn. Constructing all self-adjoint matrices with prescribed spectrum and diagonal.Advances in Computational Mathematics, 39(34):585-609, 2013. · Zbl 1279.42034
[28] G. Gautier, R. Bardenet, and M. Valko. DPPy: Sampling determinantal point processes with Python.Journal of Machine Learning Research - Machine Learning Open Source Software, 2019.
[29] G. H. Golub. Numerical methods for solving linear least squares problems.Numer. Math., 7(3):206-216, June 1965.ISSN 0029-599X.doi: 10.1007/BF01436075.URLhttp: //dx.doi.org/10.1007/BF01436075. · Zbl 0142.11502
[30] G. H. Golub and C. F. Van Loan.Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 2013. · Zbl 1268.65037
[31] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.science, 286(5439): 531-537, 1999.
[32] M. Gu and S. C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization.SIAM J. Sci. Comput., 17(4):848-869, July 1996. ISSN 1064-8275. doi: 10.1137/0917055. URLhttp://dx.doi.org/10.1137/0917055. · Zbl 0858.65044
[33] V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1207-1214. SIAM, 2012. · Zbl 1421.68227
[34] A. Horn. Doubly stochastic matrices and the diagonal of a rotation matrix.American Journal of Mathematics, 76(3):620-630, 1954. · Zbl 0055.24601
[35] J. B. Hough, M. Krishnapur, Y. Peres, and B. Vir´ag. Determinantal processes and independence.Probability surveys, 3:206-229, 2006. · Zbl 1189.60101
[36] K. Johansson. Random matrices and determinantal processes. InMathematical Statistical Physics, Session LXXXIII: Lecture Notes of the Les Houches Summer School 2005, pages 1-56. · Zbl 1411.60144
[37] A. Kulesza and B. Taskar. Determinantal point processes for machine learning.Foundations and Trends in Machine Learning, 5(2-3):123-286, 2012. · Zbl 1278.68240
[38] F. Lavancier, J. Møller, and E. Rubak. Determinantal point process models and statistical inference.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77 (4):853-877, 2015. · Zbl 1414.62403
[39] C. Li, S. Jegelka, and S. Sra. Polynomial time algorithms for dual volume sampling. In Advances in Neural Information Processing Systems, pages 5038-5047, 2017a.
[40] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu. Feature selection: A data perspective.ACM Computing Surveys (CSUR), 50(6):94, 2017b.
[41] P. Ma, M. W. Mahoney, and B. Yu. A statistical perspective on algorithmic leveraging. The Journal of Machine Learning Research, 16(1):861-911, 2015. · Zbl 1337.62164
[42] O. Macchi. The coincidence approach to stochastic point processes.Advances in Applied Probability, 7:83-122, 03 1975. · Zbl 0366.60081
[43] A. W. Marshall, I. Olkin, and B. C. Arnold.Inequalities: Theory of Majorization and its Applications, volume 143. Springer, second edition, 2011. doi: 10.1007/978-0-387-68276-1. · Zbl 1219.26003
[44] L. Mor-Yosef and H. Avron. Sketching for principal component regression.SIAM Journal on Matrix Analysis and Applications, 40(2):454-485, 2019. · Zbl 1416.65105
[45] D. Papailiopoulos, A. Kyrillidis, and C. Boutsidis. Provable deterministic leverage score sampling.InProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 997-1006, New York, NY, USA, 2014. ACM.ISBN 978-1-4503-2956-9.doi: 10.1145/2623330.2623698.URL http://doi.acm.org/10.1145/2623330.2623698.
[46] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. InProceedings of 27th Asilomar conference on signals, systems and computers, pages 40-44. IEEE, 1993.
[47] G. Raskutti and M. W. Mahoney. A statistical perspective on randomized sketching for ordinary least-squares.The Journal of Machine Learning Research, 17(1):7508-7538, 2016. · Zbl 1436.62331
[48] M. Rudelson and R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis.Journal of the ACM (JACM), 54(4):21, 2007. · Zbl 1326.68333
[49] M. Slawski. On principal components regression, random projections, and column subsampling.Electronic Journal of Statistics, 12(2):3673-3712, 2018. · Zbl 1414.62219
[50] R. Somani, C. Gupta, P. Jain, and P. Netrapalli. Support recovery for orthogonal matching pursuit: upper and lower bounds. InAdvances in Neural Information Processing Systems, pages 10814-10824, 2018.
[51] A. Soshnikov. Determinantal random point fields.Russian Mathematical Surveys, 55:923- 975, October 2000. doi: 10.1070/RM2000v055n05ABEH000321. · Zbl 0991.60038
[52] A. M. Tillmann and M. E. Pfetsch. The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing.IEEE Transactions on Information Theory, 60(2):1248-1259, 2013. · Zbl 1364.94170
[53] N. Tremblay, S. Barthelm´e, and P.-O. Amblard. Optimized algorithms to sample determinantal point processes.arXiv preprint arXiv:1802.08471, 2018.
[54] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation.IEEE Transactions on Information theory, 50(10):2231-2242, 2004. · Zbl 1288.94019
[55] J. A. Tropp and A. C. Gilbert. Signal recovery from random measurements via orthogonal matching pursuit.IEEE Transactions on information theory, 53(12):4655-4666, 2007. · Zbl 1288.94022
[56] L. Wasserman.All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013. · Zbl 1053.62005
[57] L. Welch. Lower bounds on the maximum cross correlation of signals.IEEE Transactions on Information theory, 20(3):397-399, 1974. · Zbl 0298.94006
[58] T. Zhang. Sparse recovery with orthogonal matching pursuit under RIP.IEEE Transactions on Information Theory, 57(9):6215-6221, 2011. · Zbl 1365.94091
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.