×

zbMATH — the first resource for mathematics

Sparse learning via Boolean relaxations. (English) Zbl 1328.90106
Summary: We introduce novel relaxations for cardinality-constrained learning problems, including least-squares regression as a special but important case. Our approach is based on reformulating a cardinality-constrained problem exactly as a Boolean program, to which standard convex relaxations such as the Lasserre and Sherali-Adams hierarchies can be applied. We analyze the first-order relaxation in detail, deriving necessary and sufficient conditions for exactness in a unified manner. In the special case of least-squares regression, we show that these conditions are satisfied with high probability for random ensembles satisfying suitable incoherence conditions, similar to results on \(\ell _1\)-relaxations. In contrast to known methods, our relaxations yield lower bounds on the objective, and it can be verified whether or not the relaxation is exact. If it is not, we show that randomization based on the relaxed solution offers a principled way to generate provably good feasible solutions. This property enables us to obtain high quality estimates even if incoherence conditions are not met, as might be expected in real datasets. We numerically illustrate the performance of the relaxation-randomization strategy in both synthetic and real high-dimensional datasets, revealing substantial improvements relative to \(\ell _1\)-based methods and greedy selection heuristics.

MSC:
90C25 Convex programming
90C06 Large-scale problems in mathematical programming
90C09 Boolean programming
68T05 Learning and adaptive systems in artificial intelligence
Software:
CVX
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ahlswede, R; Winter, A, Strong converse for identification via quantum channels, IEEE Trans. Inf. Theory, 48, 569-579, (2002) · Zbl 1071.94530
[2] Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd international symposium on information theory, Tsahkadsor, Armenia, USSR (September 1971) · Zbl 0283.62006
[3] Bickel, PJ; Ritov, Y; Tsybakov, A, Simultaneous analysis of lasso and Dantzig selector, Ann. Stat., 37, 1705-1732, (2009) · Zbl 1173.62022
[4] Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge, UK (2004) · Zbl 1058.90049
[5] Bühlmann, P., van de Geer, S.: Statistics for High-Dimensional Data. Springer Series in Statistics. Springer, Berlin (2011) · Zbl 1273.62015
[6] Candes, EJ; Tao, T, Decoding by linear programming, IEEE Trans. Info. Theory, 51, 4203-4215, (2005) · Zbl 1264.94121
[7] Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines (and Other Kernel Based Learning Methods). Cambridge University Press, Cambridge (2000) · Zbl 0994.68074
[8] Inc. CVX Research. CVX: Matlab software for disciplined convex programming, version 2.0 (August 2012)
[9] d’Aspremont, A., El Ghaoui, L.: Testing the nullspace property using semidefinite programming. Technical report, Princeton (2009) · Zbl 1367.62220
[10] Davidson, KR; Szarek, SJ, Local operator theory, random matrices and Banach spaces, No. 1, 317-336, (2001), Amsterdam
[11] Dekel, O; Singer, Y, Support vector machines on a budget, Adv. Neural Inf. Process. Syst., 19, 345, (2007)
[12] Donoho, DL, No article title, Compress. sensing. IEEE Trans. Info. Theory, 52, 1289-1306, (2006) · Zbl 1288.94016
[13] Donoho, DL; Elad, M; Temlyakov, VM, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. Inf. Theory, 52, 6-18, (2006) · Zbl 1288.94017
[14] Fuchs, JJ, Recovery of exact sparse representations in the presence of noise, ICASSP, 2, 533-536, (2004)
[15] Grant, M; Boyd, S; Blondel, V (ed.); Boyd, S (ed.); Kimura, H (ed.), Graph implementations for nonsmooth convex programs, 95-110, (2008), Berlin · Zbl 1205.90223
[16] Lasserre, J.B.: An explicit exact SDP relaxation for nonlinear 0-1 programs. In: Aardal K., and Gerads A.M.H., (eds.) Lecture Notes in Computer Science, 2081:293-303 (2001) · Zbl 1010.90515
[17] Laurent, M, A comparison of the Sherali-Adams, lovász-schrijver and Lasserre relaxations for 0-1 programming, Math. Oper. Res., 28, 470-496, (2003) · Zbl 1082.90084
[18] Ledoux, M.: The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI (2001) · Zbl 0995.60002
[19] Lovász, L; Schrijver, A, Cones of matrices and set-functions and 0-1 optimization, SIAM J. Optim., 1, 166-190, (1991) · Zbl 0754.90039
[20] Markowitz, H.M.: Portf. Sel. Wiley, New York (1959)
[21] McCullagh, P., Nelder, J.A.: Generalized Linear Models. Monographs on Statistics and Applied Probability 37. Chapman and Hall/CRC, New York (1989) · Zbl 0744.62098
[22] Meinshausen, N; Bühlmann, P, High-dimensional graphs and variable selection with the lasso, Ann. Stat., 34, 1436-1462, (2006) · Zbl 1113.62082
[23] Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge, UK (1995) · Zbl 0849.68039
[24] Negahban, S., Ravikumar, P., Wainwright, M.J., Yu, B.: Restricted strong convexity and generalized linear models. Technical report, UC Berkeley, Department of Statistics (August 2011)
[25] Nesterov, Y.: Primal-dual subgradient methods for convex problems. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL) (2005) · Zbl 1191.90038
[26] Oliveira, RI, Sums of random Hermitian matrices and an inequality by rudelson, Elec. Comm. Prob., 15, 203-212, (2010) · Zbl 1228.60017
[27] Pilanci, M., El Ghaoui, L., Chandrasekaran, V.: Recovery of sparse probability measures via convex programming. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2420-2428. Curran Associates, Inc. (2012) · Zbl 1264.94121
[28] Schmidt, M., van den Berg, E., Friedlander, M., Murphy, K.: Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. AISTATS 2009, 5 (2009)
[29] Sherali, HD; Adams, WP, A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems, SIAM J. Discrete Math., 3, 411-430, (1990) · Zbl 0712.90050
[30] Tibshirani, R, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B, 58, 267-288, (1996) · Zbl 0850.62538
[31] Tropp, J.A.: Just relax: Convex programming methods for subset selection and sparse approximation. ICES Report 04-04, UT-Austin, February (2004)
[32] Wainwright, MJ, Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting, IEEE Trans. Info. Theory, 55, 5728-5741, (2009) · Zbl 1367.94106
[33] Wainwright, MJ, Sharp thresholds for high-dimensional and noisy sparsity recovery using \(ℓ _1\)-constrained quadratic programming (lasso), IEEE Trans. Inf. Theory, 55, 2183-2202, (2009) · Zbl 1367.62220
[34] Wainwright, MJ, Structured regularizers: statistical and computational issues, Annu. Rev. Stat. Appl., 1, 233-253, (2014)
[35] Wainwright, M.J., Jordan, M.I.: Treewidth-based conditions for exactness of the Sherali-Adams and Lasserre relaxations. Technical report, UC Berkeley, Department of Statistics, No. 671 (September 2004)
[36] Wasserman, Larry, Bayesian model selection and model averaging, J. Math. Psychol., 44, 92-107, (2000) · Zbl 0946.62032
[37] Zhang, Y., Wainwright, M.J., Jordan, M.I.: Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In COLT conference, Barcelona, Spain, (June 2014). Full length version at http://arxiv.org/abs/1402.1918 · Zbl 0850.62538
[38] Zhao, P; Yu, B, On model selection consistency of lasso, J. Mach. Learn. Res., 7, 2541-2567, (2006) · Zbl 1222.62008
[39] Zou, H; Hastie, TJ, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B, 67, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.