×

A multilevel framework for sparse optimization with application to inverse covariance estimation and logistic regression. (English) Zbl 1348.90466

Summary: Solving \(l_1\) regularized optimization problems is common in the fields of computational biology, signal processing, and machine learning. Such \(l_1\) regularization is utilized to find sparse minimizers of convex functions. A well-known example is the least absolute shrinkage and selection operator (LASSO) problem, where the \(l_1\) norm regularizes a quadratic function. A multilevel framework is presented for solving such \(l_1\) regularized sparse optimization problems efficiently. We take advantage of the expected sparseness of the solution, and create a hierarchy of problems of similar type, which is traversed in order to accelerate the optimization process. This framework is applied for solving two problems: (1) the sparse inverse covariance estimation problem, and (2) \(l_1\) regularized logistic regression. In the first problem, the inverse of an unknown covariance matrix of a multivariate normal distribution is estimated, under the assumption that it is sparse. To this end, an \(l_1\) regularized log-determinant optimization problem needs to be solved. This task is challenging especially for large-scale datasets, due to time and memory limitations. In the second problem, the \(l_1\) regularization is added to the logistic regression classification objective to reduce overfitting to the data and obtain a sparse model. Numerical experiments demonstrate the efficiency of the multilevel framework in accelerating existing iterative solvers for both of these problems.

MSC:

90C06 Large-scale problems in mathematical programming
90C25 Convex programming
90C22 Semidefinite programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] L. Armijo, {\it Minimization of functions having Lipschitz continuous first partial derivatives.}, Pac. J. Math., 16 (1966), pp. 1-3. · Zbl 0202.46105
[2] O. Banerjee, L. El Ghaoui, and A. d’Aspremont, {\it Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data}, J. Mach. Learn. Res., 9 (2008), pp. 485-516. · Zbl 1225.68149
[3] O. Banerjee, L. El Ghaoui, A. d’Aspremont, and G. Natsoulis, {\it Convex optimization techniques for fitting sparse Gaussian graphical models}, in Proceedings of the 23rd ICML, ACM, 2006, pp. 89-96.
[4] J. A. Bilmes, {\it Factored sparse inverse covariance matrices}, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 2, 2000, IEEE, pp. II1009-II1012.
[5] J. M. Bioucas-Dias and M. A. T. Figueiredo, {\it A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration}, Proc. IEEE Trans. Image Proc., 16 (2007), pp. 2992-3004.
[6] E. Candes and M. B. Wakin, {\it An introduction to compressive sampling}, IEEE Signal Proc. Mag., March (2008), pp. 21-30.
[7] A. d’Aspremont, O. Banerjee, and L. El Ghaoui, {\it First-order methods for sparse covariance selection}, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 56-66. · Zbl 1156.90423
[8] I. Daubechies, M. Defrise, and C. De Mol, {\it An iterative thresholding algorithm for linear inverse problems with a sparsity constraint}, Commun. Pure Appl. Math., 57 (2004), pp. 1413-1457. · Zbl 1077.65055
[9] A. P. Dempster, {\it Covariance selection}, Biometrics, (1972), pp. 157-175.
[10] A. Dobra, C. Hans, B. Jones, J. R. Nevins, G. Yao, and M. West, {\it Sparse graphical models for exploring gene expression data}, J. Multivar. Anal., 90 (2004), pp. 196-212, Special Issue on Multivariate Methods in Genomic Data Analysis. · Zbl 1047.62104
[11] D. L. Donoho, {\it Compressed sensing}, IEEE Trans. Inf. Theory, 52 (2006), pp. 1289-1306. · Zbl 1288.94016
[12] M. Elad, {\it Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing}, Springer, New York, 2010. · Zbl 1211.94001
[13] M. Elad, B. Matalon, and M. Zibulevsky, {\it Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization}, Appl. Comput. Harmon. Anal., 23 (2007), pp. 346-367. · Zbl 1133.65022
[14] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, {\it Liblinear: A library for large linear classification}, J. Mach. Learn. Res., 9 (2008), pp. 1871-1874. · Zbl 1225.68175
[15] M. Figueiredo and R. Nowak, {\it An em algorithm for wavelet-based image restoration}, IEEE Trans. Signal Process., 12 (2003), pp. 906-916. · Zbl 1279.94015
[16] M. Figueiredo, R. Nowak, and S. J. Wright, {\it Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems}, IEEE J. Sel. Top. Signal Process., 1 (2007), pp. 586-597.
[17] R. Fletcher, {\it Practical Methods of Optimization}, Wiley, New York, 1987. · Zbl 0905.65002
[18] J. Friedman, T. Hastie, H. Höfling, R. Tibshirani, et al., {\it Pathwise coordinate optimization}, Ann. Appl. Stat., 1 (2007), pp. 302-332. · Zbl 1378.90064
[19] J. Friedman, T. Hastie, and R. Tibshirani, {\it Sparse inverse covariance estimation with the graphical lasso}, Biostatistics, 9 (2008), pp. 432-441. · Zbl 1143.62076
[20] J. Friedman, T. Hastie, and R. Tibshirani, {\it Regularization paths for generalized linear models via coordinate descent}, J. Stat. Software, 33 (2010), pp. 1-22.
[21] A. Genkin, D. D. Lewis, and D. Madigan, {\it Large-scale Bayesian logistic regression for text categorization}, Technometrics, 49 (2007), pp. 291-304.
[22] D. Guillot, B. Rajaratnam, B. T. Rolfs, A. Maleki, and I. Wong, {\it Iterative thresholding algorithm for sparse inverse covariance estimation}, in NIPS 2012, Lake Tahoe, CA, 2012, NIPS.
[23] J. Honorio and T. S. Jaakkola, {\it Inverse covariance estimation for high-dimensional data in linear time and space: Spectral methods for Riccati and sparse models}, in Proceedings of the 29th Conference on UAI, 2013, UAUI.
[24] C.-J. Hsieh, I. Dhillon, P. Ravikumar, and A. Banerjee, {\it A divide-and-conquer method for sparse inverse covariance estimation}, in NIPS 25, 2012, NIPS, pp. 2339-2347.
[25] C.-J. Hsieh, M. A. Sustik, I. Dhillon, P. Ravikumar, and R. Poldrack, {\it Big and Quic: Sparse inverse covariance estimation for a million variables}, in NIPS 26, 2013, NIPS, pp. 3165-3173.
[26] C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar, {\it QUIC: Quadratic approximation for sparse inverse covariance estimation}, J. Mach. Learn. Res., 15 (2014), pp. 2911-2947. · Zbl 1319.65048
[27] H. Huang, E. Haber, and L. Horesh, {\it Optimal estimation of \(l_1\) regularization prior from a regularized empirical Bayesian risk standpoint}, Inverse Probl. Imag., 6 (2012), pp. 447-464. · Zbl 1255.49060
[28] S. Huang, J. Li, L. Sun, J. Liu, T. Wu, K. Chen, A. Fleisher, E. Reiman, and J. Ye, {\it Learning brain connectivity of Alzheimer’s disease from neuroimaging data}, in Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, eds., Curran Associates, Red Hook, NY, 2009, pp. 808-816.
[29] F. M. J.-L. Starck, M. K. Nguyen, {\it Wavelets and curvelets for image deconvolution: A combined approach}, Signal Process., 83 (2003), pp. 2279-2283. · Zbl 1145.94329
[30] G. Karypis and V. Kumar, {\it A fast and high quality multilevel scheme for partitioning irregular graphs}, SIAM J. Sci. Comput., 20 (1998), pp. 359-392. · Zbl 0915.68129
[31] K. Koh, S.-J. Kim, and S. Boyd, {\it An interior-point method for large-scale \(l1\)-regularized logistic regression}, J. Mach. Learn. Res., 8 (2007), pp. 1519-1555. · Zbl 1222.62092
[32] J. D. Lee, Y. Sun, and M. A. Saunders, {\it Proximal Newton-type methods for minimizing composite functions}, SIAM J. Optim., 24 (2014), pp. 1420-1443. · Zbl 1306.65213
[33] R. Mazumder and T. Hastie, {\it Exact covariance thresholding into connected components for large-scale graphical lasso}, J. Mach. Learn. Res., 13 (2012), pp. 781-794. · Zbl 1283.62148
[34] A. Y. Ng, {\it Feature selection, l1 vs. l2 regularization, and rotational invariance}, in Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, New York, 2004, ACM, pp. 78ff.
[35] P. A. Olsen, F. Öztoprak, J. Nocedal, and S. J. Rennie, {\it Newton-like methods for sparse inverse covariance estimation}, in NIPS 25, 2012, NIPS, pp. 764-772.
[36] P. Parpas, D. V. N. Luong, D. Rueckert, and B. Rustem, {\it A multilevel proximal algorithm for large scale composite convex optimization}, submitted. · Zbl 1348.90464
[37] S. Ravishankar and Y. Bresler, {\it Learning doubly sparse transforms for images}, IEEE Trans. Image Proc., 22 (2013), pp. 4598-4612. · Zbl 1373.94344
[38] H. Rue and L. Held, {\it Gaussian Markov Random Fields: Theory and Applications}, Monographs on Statistics and Applied Probability 104, Chapman & Hall, London, 2005. · Zbl 1093.60003
[39] Y. Saad, {\it Iterative Methods for Sparse Linear Systems}, 2nd ed., SIAM, Philadelphia, 2003. · Zbl 1031.65046
[40] S. Sardy, A. G. Bruce, and P. Tseng, {\it Block coordinate relaxation methods for nonparametric signal denoising with wavelet dictionaries}, Comput. Graph. Stat., 9 (2000), pp. 361-379.
[41] S. Shalev-Shwartz and A. Tewari, {\it Stochastic methods for l1 regularized loss minimization}, in Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, New York, 2009, ACM, pp. 929-936.
[42] S. Sra, S. Nowozin, and S. J. Wright, {\it Optimization for Machine Learning}, MIT Press, 2012.
[43] R. Tibshirani, {\it Regression shrinkage and selection via the lasso}, J. R. Stat. Soc. Series B, 58 (1996), pp. 267-288. · Zbl 0850.62538
[44] E. Treister and J. S. Turek, {\it A block-coordinate descent approach for large-scale sparse inverse covariance estimation}, in NIPS 27, 2014, NIPS. · Zbl 1348.90466
[45] E. Treister and I. Yavneh, {\it A multilevel iterated-shrinkage approach to \(l_1\) penalized least-squares minimization}, IEEE Trans. Signal Proc., 60 (2012), pp. 6319-6329. · Zbl 1393.94463
[46] P. Tseng and S. Yun, {\it A coordinate gradient descent method for nonsmooth separable minimization}, Math. Program., 117 (2009), pp. 387-423. · Zbl 1166.90016
[47] Z. Wen, W. Yin, D. Goldfarb, and Y. Zhang, {\it A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation}, SIAM Sci. Comput., 32 (2010), pp. 1832-1857. · Zbl 1215.49039
[48] D. M. Witten and R. Tibshirani, {\it Covariance-regularized regression and classification for high dimensional problems}, J. R. Stat. Soc. Series B, 71 (2009), pp. 615-636. · Zbl 1250.62033
[49] S. J. Wright, R. Nowak, and M. Figueiredo, {\it Sparse reconstruction by seperable approximation}, IEEE Trans. Signal Proc., 57 (2009), pp. 2479-2493. · Zbl 1391.94442
[50] T. T. Wu and K. Lange, {\it Coordinate descent algorithms for lasso penalized regression}, Ann. Appl. Stat., 2 (2008), pp. 224-244. · Zbl 1137.62045
[51] G.-X. Yuan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin, {\it A comparison of optimization methods and software for large-scale \(l1\)-regularized linear classification}, J. Mach. Learn. Res., 11 (2010), pp. 3183-3234. · Zbl 1242.62065
[52] G.-X. Yuan, C.-H. Ho, and C.-J. Lin, {\it An improved GLMNET for \(l1\)-regularized logistic regression}, J. Mach. Learn. Res., 13 (2012), pp. 1999-2030. · Zbl 1432.68404
[53] S. Yun and K.-C. Toh, {\it A coordinate gradient descent method for \(l_1\)-regularized convex minimization}, Comput. Optim. Appl., 48 (2011), pp. 273-307. · Zbl 1220.90092
[54] M. Zibulevsky and M. Elad, {\it \(L1-l2\) optimization in signal and image processing}, IEEE Signal Proc. Mag., 27 (2010), pp. 76-88.
[55] H. Zou and T. Hastie, {\it Regularization and variable selection via the elastic net}, J. R. Stat. Soc. Series B, 67 (2005), pp. 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.