×

Mean estimation with sub-Gaussian rates in polynomial time. (English) Zbl 1454.62162

The author studies polynomial time algorithms for estimating the mean of a heavy-tailed multivariate random vector. The only assumption is that the random vector \(X\) has finite mean and covariance. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that \(X\) is Gaussian or sub-Gaussian. A polynomial time algorithm is proposed to estimate the mean with sub-Gaussian-size confidence intervals under these assumptions. The algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely many moments of \(X\) either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension. The algorithm runs in polynomial time, but it is not close to practical for any substantially high-dimensional data set.

MSC:

62H12 Estimation in multivariate analysis
62G32 Statistics of extreme values; tail inference
68W01 General topics in the theory of algorithms
90C22 Semidefinite programming
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Abbe, E., Bandeira, A. S. and Hall, G. (2016). Exact recovery in the stochastic block model. IEEE Trans. Inform. Theory 62 471-487. · Zbl 1359.94047 · doi:10.1109/TIT.2015.2490670
[2] Alon, N., Matias, Y. and Szegedy, M. (1999). The space complexity of approximating the frequency moments. J. Comput. System Sci. 58 137-147. · Zbl 0938.68153 · doi:10.1006/jcss.1997.1545
[3] Alon, N. and Naor, A. (2006). Approximating the cut-norm via Grothendieck’s inequality. SIAM J. Comput. 35 787-803. · Zbl 1096.68163 · doi:10.1137/S0097539704441629
[4] Amini, A. A. and Wainwright, M. J. (2008). High-dimensional analysis of semidefinite relaxations for sparse principal components. In ISIT 2454-2458. IEEE.
[5] Banks, J., Kleinberg, R. and Moore, C. (2017). The Lovász theta function for random regular graphs and community detection in the hard regime. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. LIPIcs. Leibniz Int. Proc. Inform. 81 Art. No. 28, 22. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern. · Zbl 1420.05160
[6] Barak, B. and Moitra, A. (2016). Noisy tensor completion via the sum-of-squares hierarchy, COLT. J. Mach. Learn. Res. Workshop Conf. Proc. 49 417-445.
[7] Barak, B. and Steurer, D. (2017). The sos algorithm over general domains. In Lecture Notes: Proofs, Beliefs and Algorithms Through the Lens of Sum of Squares. https://www.sumofsquares.org/public/lec-definitions-general.html.
[8] Bernholt, T. (2006). Robust estimators are hard to compute. Technical Report, Univ. Dortmund.
[9] Berthet, Q. and Rigollet, P. (2013). Complexity theoretic lower bounds for sparse principal component detection, COLT. J. Mach. Learn. Res. Workshop Conf. Proc. 30 1046-1066.
[10] Bhattiprolu, V., Ghosh, M., Guruswami, V., Lee, E. and Tulsiani, M. (2018). Inapproximability of matrix \(p\rightarrow q\) norms. In Electronic Colloquium on Computational Complexity (ECCC) 25 37. · Zbl 1431.68041
[11] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge. · Zbl 1058.90049
[12] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717-772. · Zbl 1219.90124 · doi:10.1007/s10208-009-9045-5
[13] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inform. Theory 56 2053-2080. · Zbl 1366.15021 · doi:10.1109/TIT.2010.2044061
[14] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148-1185. · Zbl 1282.62070 · doi:10.1214/11-AIHP454
[15] Cherapanamjeri, Y., Flammarion, N. and Bartlett, P. L. (2019). Fast mean estimation with sub-Gaussian rates. Available at arXiv:1902.01998.
[16] Cohen, M. B., Lee, Y. T., Miller, G., Pachocki, J. and Sidford, A. (2016). Geometric median in nearly linear time. In STOC’16—Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing 9-21. ACM, New York. · Zbl 1377.68267
[17] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434-448. · Zbl 1128.90050 · doi:10.1137/050645506
[18] Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett. 107 065701.
[19] Devroye, L., Lerasle, M., Lugosi, G. and Oliveira, R. I. (2016). Sub-Gaussian mean estimators. Ann. Statist. 44 2695-2725. · Zbl 1360.62115 · doi:10.1214/16-AOS1440
[20] Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A. and Stewart, A. (2016). Robust estimators in high dimensions without the computational intractability. In 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016 655-664. IEEE Computer Soc., Los Alamitos, CA. · Zbl 1421.68149 · doi:10.1137/17M1126680
[21] Faloutsos, M., Faloutsos, P. and Faloutsos, C. (1999). On power-law relationships of the Internet topology. In ACM SIGCOMM Computer Communication Review 29 251-262. ACM, New York. · Zbl 0889.68050 · doi:10.1006/jcss.1997.1522
[22] Goemans, M. X. and Williamson, D. P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42 1115-1145. · Zbl 0885.68088 · doi:10.1145/227683.227684
[23] Guédon, O. and Vershynin, R. (2016). Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Related Fields 165 1025-1049. · Zbl 1357.90111 · doi:10.1007/s00440-015-0659-z
[24] Hopkins, S. B. (2020). Supplement to “Mean estimation with sub-Gaussian rates in polynomial time.” https://doi.org/10.1214/19-AOS1843SUPP.
[25] Hopkins, S. B., Kothari, P. K., Potechin, A., Raghavendra, P., Schramm, T. and Steurer, D. (2017). The power of sum-of-squares for detecting hidden structures. In 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017 720-731. IEEE Computer Soc., Los Alamitos, CA.
[26] Hopkins, S. B. and Li, J. (2018). Mixture models, robustness, and sum of squares proofs. In STOC’18—Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing 1021-1034. ACM, New York. · Zbl 1428.68240
[27] Hopkins, S. B. and Steurer, D. (2017). Efficient Bayesian estimation from few samples: Community detection and related problems. In 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017 379-390. IEEE Computer Soc., Los Alamitos, CA.
[28] Hsu, D. and Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 17 Paper No. 18, 40. · Zbl 1360.62380
[29] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[30] Jerrum, M. R., Valiant, L. G. and Vazirani, V. V. (1986). Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 43 169-188. · Zbl 0597.68056 · doi:10.1016/0304-3975(86)90174-X
[31] Klivans, A., Kothari, P. K. and Meka, R. (2018). Efficient algorithms for outlier-robust regression. arXiv preprint, arXiv:1803.03241.
[32] Kothari, P. K., Steinhardt, J. and Steurer, D. (2018). Robust moment estimation and improved clustering via sum of squares. In STOC’18—Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing 1035-1046. ACM, New York. · Zbl 1434.62125
[33] Krauthgamer, R., Nadler, B. and Vilenchik, D. (2015). Do semidefinite relaxations solve sparse PCA up to the information limit? Ann. Statist. 43 1300-1322. · Zbl 1320.62138 · doi:10.1214/15-AOS1310
[34] Lai, K. A., Rao, A. B. and Vempala, S. (2016). Agnostic estimation of mean and covariance. In 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016 665-674. IEEE Computer Soc., Los Alamitos, CA.
[35] Lasserre, J. B. (2000/01). Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11 796-817. · Zbl 1010.90061 · doi:10.1137/S1052623400366802
[36] Lerasle, M. and Oliveira, R. I. (2011). Robust empirical mean estimators. arXiv preprint, arXiv:1112.3914.
[37] Leskovec, J., Kleinberg, J. and Faloutsos, C. (2005). Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining 177-187. ACM New York.
[38] Lugosi, G. and Mendelson, S. (2019). Sub-Gaussian estimators of the mean of a random vector. Ann. Statist. 47 783-794. · Zbl 1417.62192 · doi:10.1214/17-AOS1639
[39] Ma, T., Shi, J. and Steurer, D. (2016). Polynomial-time tensor decompositions with sum-of-squares. In 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016 438-446. IEEE Computer Soc., Los Alamitos, CA.
[40] Ma, T. and Wigderson, A. (2015). Sum-of-squares lower bounds for sparse PCA. NIPS 1612-1620.
[41] Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308-2335. · Zbl 1348.60041 · doi:10.3150/14-BEJ645
[42] Minsker, S. (2018). Uniform bounds for robust mean estimators. arXiv preprint, arXiv:1812.03523. · Zbl 1418.62235 · doi:10.1214/17-AOS1642
[43] Montanari, A. and Sen, S. (2016). Semidefinite programs on sparse random graphs and their application to community detection. In STOC’16—Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing 814-827. ACM, New York. · Zbl 1376.90043
[44] Nemirovsky, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York. Translated from the Russian and with a preface by E. R. Dawson. · Zbl 0501.90062
[45] Nesterov, Y. (2000). Squared functional systems and optimization problems. In High Performance Optimization. Appl. Optim. 33 405-440. Kluwer Academic, Dordrecht. · Zbl 0958.90090
[46] Nesterov, Yu. (1998). Semidefinite relaxation and nonconvex quadratic optimization. Optim. Methods Softw. 9 141-160. · Zbl 0904.90136 · doi:10.1080/10556789808805690
[47] Parrilo, P. A. (2000). Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D. thesis, California Institute of Technology, Pasadena, CA.
[48] Potechin, A. and Steurer, D. (2017). Exact tensor completion with sum-of-squares. Proceedings of Machine Learning Research 65 1-54.
[49] Raghavendra, P., Schramm, T. and Steurer, D. (2018). High-dimensional estimation via sum-of-squares proofs. Available at arXiv:1807.11419.
[50] Raghavendra, P. and Weitz, B. (2017). On the bit complexity of sum-of-squares proofs. In 44th International Colloquium on Automata, Languages, and Programming. LIPIcs. Leibniz Int. Proc. Inform. 80 Art. No. 80, 13. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern. · Zbl 1441.68100
[51] Rahm, E. and Do Hai, H. (2000). Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23 3-13.
[52] Shor, N. Z. (1987). An approach to obtaining global extremums in polynomial mathematical programming problems. Cybernetics 23 695-700. · Zbl 0654.90076 · doi:10.1007/BF01074929
[53] Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics 448-485. Stanford Univ. Press, Stanford, CA. · Zbl 0201.52803
[54] Vandenberghe, L., Boyd, S. and Wu, S.-P. (1998). Determinant maximization with linear matrix inequality constraints. SIAM J. Matrix Anal. Appl. 19 499-533. · Zbl 0959.90039 · doi:10.1137/S0895479896303430
[55] Wang, T. and Samworth, R. J. (2018). High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 57-83. · Zbl 1439.62199 · doi:10.1111/rssb.12243
[56] Williamson, D. · Zbl 1219.90004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.