×

A new perspective on least squares under convex constraint. (English) Zbl 1302.62053

Summary: Consider the problem of estimating the mean of a Gaussian random vector when the mean vector is assumed to be in a given convex set. The most natural solution is to take the Euclidean projection of the data vector on to this convex set; in other words, performing “least squares under a convex constraint”. Many problems in modern statistics and statistical signal processing theory are special cases of this general situation. Examples include the lasso and other high-dimensional regression techniques, function estimation problems, matrix estimation and completion, shape-restricted regression, constrained denoising, linear inverse problems, etc. This paper presents three general results about this problem, namely, (a) an exact computation of the main term in the estimation error by relating it to expected maxima of Gaussian processes (existing results only give upper bounds), (b) a theorem showing that the least squares estimator is always admissible up to a universal constant in any problem of the above kind and (c) a counterexample showing that least squares estimator may not always be minimax rate-optimal. The result from part (a) is then used to compute the error of the least squares estimator in two examples of contemporary interest.

MSC:

62F10 Point estimation
62F12 Asymptotic properties of parametric estimators
62F30 Parametric inference under constraints
62G08 Nonparametric regression and quantile regression

Software:

ElemStatLearn
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Amelunxen, D., Lotz, M., McCoy, M. B. and Tropp, J. A. (2013). Living on the edge: A geometric theory of phase transitions in convex optimization. Preprint. Available at . · Zbl 1339.90251
[2] Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T. and Silverman, E. (1955). An empirical distribution function for sampling with incomplete information. Ann. Inst. Statist. Math. 26 641-647. · Zbl 0066.38502 · doi:10.1214/aoms/1177728423
[3] Bartlett, P. L., Mendelson, S. and Neeman, J. (2012). \(\ell_1\)-regularized linear regression: Persistence and oracle inequalities. Probab. Theory Related Fields 154 193-224. · Zbl 1395.62207 · doi:10.1007/s00440-011-0367-2
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[5] Birgé, L. (1983). Approximation dans les espaces métriques et théorie de l’estimation. Z. Wahrsch. Verw. Gebiete 65 181-237.
[6] Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113-150. · Zbl 0805.62037 · doi:10.1007/BF01199316
[7] Borell, C. (1975). The Brunn-Minkowski inequality in Gauss space. Invent. Math. 30 207-216. · Zbl 0292.60004 · doi:10.1007/BF01425510
[8] Brunk, H. D. (1970). Estimation of isotonic regression. In Nonparametric Techniques in Statistical Inference ( Proc. Sympos. , Indiana Univ. , Bloomington , Ind. , 1969) 177-197. Cambridge Univ. Press, London.
[9] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015 · doi:10.1007/978-3-642-20192-9
[10] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[11] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[12] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[13] Carolan, C. and Dykstra, R. (1999). Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist. 27 557-566. · Zbl 0949.62035 · doi:10.2307/3316111
[14] Cator, E. (2011). Adaptivity and optimality of the monotone least-squares estimator. Bernoulli 17 714-735. · Zbl 1345.62066 · doi:10.3150/10-BEJ289
[15] Chandrasekaran, V. and Jordan, M. I. (2013). Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. USA 110 E1181-E1190. · Zbl 1292.62019 · doi:10.1073/pnas.1302293110
[16] Chandrasekaran, V., Recht, B., Parrilo, P. A. and Willsky, A. S. (2012). The convex geometry of linear inverse problems. Found. Comput. Math. 12 805-849. · Zbl 1280.52008 · doi:10.1007/s10208-012-9135-7
[17] Chatterjee, S. (2013). Assumptionless consistency of the lasso. Preprint. Available at .
[18] Chatterjee, S., Guntuboyina, A. and Sen, B. (2013). Improved risk bounds in isotonic regression. Preprint. Available at . · Zbl 1317.62032
[19] Cirel’son, B. S., Ibragimov, I. A. and Sudakov, V. N. (1976). Norms of Gaussian sample functions. In Proceedings of the Third Japan-USSR Symposium on Probability Theory ( Tashkent , 1975). Lecture Notes in Math. 550 20-41. Springer, Berlin. · Zbl 0359.60019
[20] Donoho, D. (1991). Gelfand \(n\)-widths and the method of least squares. Technical report, Dept. Statistics, Univ. California, Berkeley.
[21] Donoho, D. L. (2006). For most large underdetermined systems of equations, the minimal \(l_1\)-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907-934. · Zbl 1105.90068 · doi:10.1002/cpa.20131
[22] Donoho, D. L. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via \(l^1\) minimization. Proc. Natl. Acad. Sci. USA 100 2197-2202 (electronic). · Zbl 1064.94011 · doi:10.1073/pnas.0437847100
[23] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6-18. · Zbl 1288.94017 · doi:10.1109/TIT.2005.860430
[24] Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845-2862. · Zbl 1019.94503 · doi:10.1109/18.959265
[25] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. · Zbl 0815.62019 · doi:10.1093/biomet/81.3.425
[26] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B 57 301-369. · Zbl 0827.62035
[27] Dudley, R. M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1 290-330. · Zbl 0188.20502 · doi:10.1016/0022-1236(67)90017-1
[28] Durot, C. (2002). Sharp asymptotics for isotonic regression. Probab. Theory Related Fields 122 222-240. · Zbl 0992.60028 · doi:10.1007/s004400100171
[29] Durrett, R. (2010). Probability : Theory and Examples , 4th ed. Cambridge Univ. Press, Cambridge. · Zbl 1202.60001 · doi:10.1017/CBO9780511779398
[30] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[31] Foygel, R. and Mackey, L. (2014). Corrupted sensing: Novel guarantees for separating structured signals. IEEE Trans. Inform. Theory 60 1223-1247. · Zbl 1364.94124 · doi:10.1109/TIT.2013.2293654
[32] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[33] Grenander, U. (1956). On the theory of mortality measurement. II. Skand. Aktuarietidskr. 39 125-153. · Zbl 0077.33715
[34] Groeneboom, P. and Pyke, R. (1983). Asymptotic normality of statistics based on the convex minorants of empirical distribution functions. Ann. Probab. 11 328-345. · Zbl 0521.62016 · doi:10.1214/aop/1176993599
[35] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[36] Homrighausen, D. and McDonald, D. (2013). The lasso, persistence, and cross-validation. JMLR W&CP 28 1031-1039.
[37] Jankowski, H. K. (2012). Convergence of linear functionals of the Grenander estimator under misspecification. Preprint. Available at . · Zbl 1302.62045 · doi:10.1214/13-AOS1196
[38] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[39] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486 · doi:10.3150/09-BEJ187
[40] Ledoux, M. (2001). The Concentration of Measure Phenomenon . Amer. Math. Soc., Providence, RI. · Zbl 0995.60002
[41] Massart, P. (2007). Concentration Inequalities and Model Selection . Springer, Berlin. · Zbl 1170.60006 · doi:10.1007/978-3-540-48503-2
[42] McCoy, M. B. and Tropp, J. A. (2013). The achievable performance of convex demixing. Preprint. Available at .
[43] McCoy, M. B. and Tropp, J. A. (2014). From Steiner formulas for cones to concentration of intrinsic volumes. Discrete Comput. Geom. 51 926-963. · Zbl 1317.52010 · doi:10.1007/s00454-014-9595-4
[44] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[45] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246-270. · Zbl 1155.62050 · doi:10.1214/07-AOS582
[46] Meyer, M. and Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Ann. Statist. 28 1083-1104. · Zbl 1105.62340 · doi:10.1214/aos/1015956708
[47] Oymak, S. and Hassibi, B. (2010). New null space results and recovery thresholds for matrix rank minimization. Preprint. Available at .
[48] Oymak, S. and Hassibi, B. (2013). Sharp MSE bounds for proximal denoising. Preprint. Available at . · Zbl 1380.90221
[49] Oymak, S., Thrampoulidis, C. and Hassibi, B. (2013). The squared-error of generalized LASSO: A precise analysis. Preprint. Available at .
[50] Pollard, D. (1984). Convergence of Stochastic Processes . Springer, New York. · Zbl 0544.60045
[51] Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density. Sankhyā Ser. A 31 23-36. · Zbl 0181.45901
[52] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731-771. · Zbl 1215.62043 · doi:10.1214/10-AOS854
[53] Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference . Wiley, Chichester. · Zbl 0645.62028
[54] Rudelson, M. and Vershynin, R. (2008). On sparse reconstruction from Fourier and Gaussian measurements. Comm. Pure Appl. Math. 61 1025-1045. · Zbl 1149.94010 · doi:10.1002/cpa.20227
[55] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 1 197-206. Univ. California Press, Berkeley and Los Angeles. · Zbl 0073.35602
[56] Stojnic, M. (2009). Various thresholds for \(\ell_1\)-optimization in compressed sensing. Preprint. Available at .
[57] Sudakov, V. N. and Cirel’son, B. S. (1974). Extremal properties of half-spaces for spherically invariant measures. Problems in the theory of probability distributions, II. Zap. Naučn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. ( LOMI ) 41 14-24, 165. · Zbl 0395.28007
[58] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[59] Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 273-282. · doi:10.1111/j.1467-9868.2011.00771.x
[60] Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198-1232. · Zbl 1274.62469 · doi:10.1214/12-AOS1003
[61] Tropp, J. A. (2014). Forthcoming article. Private communication.
[62] Tsirel’son, B. S. (1982). A geometric approach to maximum likelihood estimation for an infinite-dimensional Gaussian location. I. Teor. Veroyatn. Primen. 27 388-395. · Zbl 0498.62075
[63] Tsirelson, B. S. (1985). A geometric approach to maximum likelihood estimation for an infinite-dimensional Gaussian location. II. Teor. Veroyatn. Primen. 30 772-779.
[64] Tsirelson, B. S. (1986). A geometric approach to maximum likelihood estimation for an infinite-dimensional Gaussian location. III. Teor. Veroyatn. Primen. 31 537-549.
[65] van de Geer, S. (1987). A new approach to least-squares estimation, with applications. Ann. Statist. 15 587-602. · Zbl 0625.62046 · doi:10.1214/aos/1176350362
[66] van de Geer, S. (1990). Estimating a regression function. Ann. Statist. 18 907-924. · Zbl 0709.62040 · doi:10.1214/aos/1176347632
[67] van de Geer, S. (1993). Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist. 21 14-44. · Zbl 0779.62033 · doi:10.1214/aos/1176349013
[68] van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press, Cambridge. · Zbl 1179.62073
[69] van de Geer, S. and Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. In From Probability to Statistics and Back : High-Dimensional Models and Processes. Inst. Math. Stat. ( IMS ) Collect. 9 303-316. IMS, Beachwood, OH. · Zbl 1327.62426 · doi:10.1214/12-IMSCOLL922
[70] van de Geer, S. and Wegkamp, M. (1996). Consistency for the least squares estimator in nonparametric regression. Ann. Statist. 24 2513-2523. · Zbl 0867.62027 · doi:10.1214/aos/1032181165
[71] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[72] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes : with Applications to Statistics . Springer, New York. · Zbl 0862.60002
[73] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell_1\)-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 · doi:10.1109/TIT.2009.2016018
[74] Wang, H. and Leng, C. (2007). Unified LASSO estimation by least squares approximation. J. Amer. Statist. Assoc. 102 1039-1048. · Zbl 1306.62167 · doi:10.1198/016214507000000509
[75] Wang, Y. (1996). The \(L_2\) risk of an isotonic estimate. Comm. Statist. Theory Methods 25 281-294. · Zbl 0875.62143 · doi:10.1080/03610929608831695
[76] Zhang, C.-H. (2002). Risk bounds in isotonic regression. Ann. Statist. 30 528-555. · Zbl 1012.62045 · doi:10.1214/aos/1021379864
[77] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
[78] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[79] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061 · doi:10.1214/009053607000000127
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.