zbMATH — the first resource for mathematics

Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method. (English) Zbl 1434.68430
Summary: The octagonal shrinkage and clustering algorithm for regression (OSCAR), equipped with the \(\ell_1\)-norm and a pair-wise \(\ell_{\infty}\)-norm regularizer, is a useful tool for feature selection and grouping in high-dimensional data analysis. The computational challenge posed by OSCAR, for high dimensional and/or large sample size data, has not yet been well resolved due to the non-smoothness and non-separability of the regularizer involved. In this paper, we successfully resolve this numerical challenge by proposing a sparse semismooth Newton-based augmented Lagrangian method to solve the more general SLOPE (the sorted L-one penalized estimation) model. By appropriately exploiting the inherent sparse and low-rank property of the generalized Jacobian of the semismooth Newton system in the augmented Lagrangian subproblem, we show how the computational complexity can be substantially reduced. Our algorithm offers a notable computational advantage in the high-dimensional statistical regression settings. Numerical experiments are conducted on real data sets, and the results demonstrate that our algorithm is far superior, in both speed and robustness, to the existing state-of-the-art algorithms based on first-order iterative schemes, including the widely used accelerated proximal gradient (APG) method and the alternating direction method of multipliers (ADMM).

68T05 Learning and adaptive systems in artificial intelligence
62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
Full Text: Link
[1] Jean-Pierre Aubin and H´el‘ene Frankowska.Set-Valued Analysis. Birkh¨auser, 1990.
[2] Richard E. Barlow and H. D. Brunk. The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140-147, 1972.
[3] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183-202, 2009.
[4] Ma lgorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Cand‘es. SLOPE-adaptive variable selection via convex optimization.The Annals of Applied Statistics, 9(3):1103-1140, 2015.
[5] Howard D. Bondell and Brian J. Reich. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.Biometrics, 64(1):115-123, 2008.
[6] Liang Chen, Defeng Sun, and Kim-Chuan Toh. An efficient inexact symmetric GaussSeidel based majorized ADMM for high-dimensional convex composite conic programming.Mathematical Programming, Series A, 161(1-2):237-270, 2017.
[7] Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders. Atomic decomposition by basis pursuit.SIAM Journal on Scientific Computing, 20:33-61, 1998.
[8] Ying Cui, Defeng Sun, and Kim-Chuan Toh. On the R-superlinear convergence of the KKT residuals generated by the augmented lagrangian method for convex composite conic programming.Mathematical Programming, DOI: 10.1007/s10107-018-1300-6, 2018.
[9] Maryam Fazel, Ting Kei Pong, Defeng Sun, and Paul Tseng. Hankel matrix rank minimization with applications to system identification and realization.SIAM Journal on Matrix Analysis and Applications, 34(3):946-977, 2013.
[10] Mario A. T. Figueiredo, Robert D. Nowak, and Stephen J. Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Jounral of Selected Topics in Signal Processing, 1:586-597, 2007.
[11] Jerome Friedman, Trevor Hastie, Holger H¨ofling, and Robert Tibshirani. Pathwise coordinate optimization.The Annals of Applied Statistics, 1(2):302-332, 2007.
[12] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of Statistical Software, 33(1):1-22, 2010.
[13] Daniel Gabay and Bertrand Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation.Computers and Mathematics with Applications, 2(1):17-40, 1976.
[14] Roland Glowinski and Americo Marrocco. Sur l’approximation, par ´el´ements finis d’ordre un, et la r´esolution, par p´enalisation-dualit´e d’une classe de probl‘emes de dirichlet non lin´eaires.Revue fran¸caise d’atomatique, Informatique Recherche Op´erationelle. Analyse Num´erique, 9(2):41-76, 1975.
[15] Yuwen Gu, Jun Fan, Lingchen Kong, Shiqian Ma, and Hui Zou.ADMM for highdimensional sparse penalized quantile regression.Technometrics, 60(3):319-331, 2018.
[16] Jiye Han and Defeng Sun.Newton and quasi-Newton methods for normal maps with polyhedral sets.Journal of Optimization Theory and Applications, 94(3):659-676, 1997.
[17] Ronald R. Hocking. The analysis and selection of variables in linear regression.Biometrics, 32:1-49, 1976.
[18] Jian Huang, Shuangge Ma, and Cun-Hui Zhang. Adaptive lasso for sparse high-dimensional regression models.Statistica Sinica, 18(4):1603-1618, 2008.
[19] Laurent Jacob, Guillaume Obozinski, and Jean Philippe Vert. Group lasso with overlap and graph lasso. InInternational Conference on Machine Learning, pages 433-440, 2009.
[20] E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin, M. Iredell, S. Saha, G. White, J. Woollen, Y. Zhu, M. Chelliah, W. Ebisuzaki, W. Higgins, J. Janowiak, K. C. Mo, C. Ropelewski, J. Wang, A. Leetmaa, R. Reynolds, R. Jenne, and Joseph J. D. The NCEP/NCAR 40-year reanalysis project.Bulletin of the American Meteorological Society, 77:437-472, 1996.
[21] Bernd Kummer. Newton’s method for non-differentiable functions.Advances in Mathematical Optimization, 45:114-125, 1988.
[22] Xudong Li, Defeng Sun, and Kim-Chuan Toh. An efficient linearly convergent semismooth Netwon-CG augmented lagrangian method for Lasso problems.SIAM Journal on Optimization, 28(1):433-458, 2018a.
[23] Xudong Li, Defeng Sun, and Kim-Chuan Toh. On efficiently solving the subproblems of a level-set method for fused lasso problems.SIAM Journal on Optimization, 28:1842-1866, 2018b.
[24] Moshe Lichman. UCI machine learning repository.http://archive.ics.uci.edu/ml/ datasets.html, 2013.
[25] Julien Mairal, Francis Bach, and Jean Ponce. Sparse modeling for image and vision processing.Foundations and Trends in Computer Graphics and Vision, 8:85-283, 2014.
[26] Robert Mifflin. Semismooth and semiconvex functions in constrained optimization.SIAM Journal on Control and Optimization, 15(6):959-972, 1977.
[27] Alan Miller.Subset Selection in Regression. Chapman & Hall, London, UK, 2002.
[28] Jean-Jacques Moreau. Proximit´e et dualit´e dans un espace hilbertien.Bulletin de la Socie´te´ Mathe´matique de France, 93(2):273-299, 1965.
[29] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. GAP safe screening rules for sparse-group lasso. InAdvances in Neural Information Processing Systems 29 (NIPS 2016), pages 388-396, 2016.
[30] Yurii Nesterov. A method of solving a convex programming problem with convergence rate o(1/k2).Soviet Mathematics Doklady, 27(2):372-376, 1983.
[31] Liqun Qi and Jie Sun. A nonsmooth version of Newton’s method.Mathematical Programming, 58(1):353-367, 1993.
[32] Franck Rapaport, Andrei Zinovyev, Marie Dutreix, Emmanuel Barillot, and Jean-Philippe Vert. Classification of microarray data using gene networks.BMC Bioinformatics, 8:35, 2007.
[33] Tim Robertson, Farroll T. Wright, and Richard Dykstra.Order Restricted Statistical Inference. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Chichester, 1988. ISBN 0-471-91787-7.
[34] Stephen M. Robinson. Some continuity properties of polyhedral multifunctions.Mathematical Programming Study, 16:206-214, 1981.
[35] Ralph Tyrell Rockafellar.Convex Analysis. Princeton University Press, 1970.
[36] Ralph Tyrell Rockafellar. Monotone operators and the proximal point algorithm.SIAM Journal on Control and Optimization, 14(5):877-898, 1976a.
[37] Ralph Tyrell Rockafellar. Augmented Lagrangians and applications of the proximal point algorithm in convex programming.Mathematics of Operations Research, 1(2):97-116, 1976b.
[38] Ralph Tyrell Rockafellar and Roger J-B Wets.Variational Analysis. Springer-Verlag, Berlin, 1998.
[39] Todd E. Scheetz, Kwang-Youn A. Kim, Ruth E. Swiderski, Alisdair R. Philp, Terry A. Braun, Kevin L. Knudtson, Anne M. Dorrance, Gerald F. DiBona, Jian Huang, and Thomas L. Casavant. Regulation of gene expression in the mammalian eye and its relevance to eye disease.Proceedings of the National Academy of Sciences, 103(39):14429- 14434, 2006.
[40] Mervyn J. Silvapulle and Pranab Kumar Sen.Constrained Statistical Inference: Order, Inequality, and Shape Constraints, volume 912. John Wiley & Sons, 2011.
[41] Defeng Sun and Jie Sun. Semismooth matrix-valued functions.Mathematics of Operations Research, 27(1):150-169, 2002.
[42] Jie Sun.On monotropic piecewise quadratic programming. PhD thesis, University of Washington, 1986.
[43] Robert Tibshirani. Regression shrinkage and selection via the LASSO.Journal of the Royal Statistical Society, 58:267-288, 1996.
[44] Joel A. Tropp. Just relax: Convex programming methods for identifying sparse signals. IEEE Transactions on Information Theory, 51:1030-1051, 2006.
[45] M. J. Van de Vijver, Y. D. He, L. J. Vant Veer, H. Hart A.A. Dai, Voskuil D. W., Schreiber G.J., Peterse J. L., Roberts C., Marton M. J., Parrish M., Atsma D., Witteveen A., Glas A., Delahaye L., Van der Velde T., Bartelink H., Rodenhuis S., Rutgers E.T., Friend S. H., and Bernards R. A gene-expression signature as a predictor of survival in breast cancer.The New England Journal of Medicine, 347:1999-2009, 2002.
[46] Yuhang Wang, Fillia S. Makedon, James C. Ford, and Justin Pearlman. Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data.Bioinformatics, 21:1530-1537, 2005.
[47] Bin Wu, Chao Ding, Defeng Sun, and Kim-Chuan Toh. On the Moreau-Yosida regularization of the vector k-norm related functions.SIAM Journal on Optimization, 24(2): 766-794, 2014.
[48] Xiangrong Zeng and M´ario AT Figueiredo. Solving OSCAR regularization problems by proximal splitting algorithms.Digital Signal Processing, 31:124-135, 2014a.
[49] Xiangrong Zeng and M´ario AT Figueiredo. Decreasing weighted sorted‘1regularization. IEEE Signal Processing Letters, 21(10):1240-1244, 2014b.
[50] Yangjing Zhang, Ning Zhang, Defeng Sun, and Kim-Chuan Toh. An efficient hessian based algorithm for solving large-scale sparse group lasso problems.Mathematical Programming, DOI:10.1007/s10107-018-1329-6, 2018.
[51] Leon Wenliang Zhong and James T.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.