×

Inexact variable metric stochastic block-coordinate descent for regularized optimization. (English) Zbl 1441.90158

Summary: Block-coordinate descent is a popular framework for large-scale regularized optimization problems with block-separable structure. Existing methods have several limitations. They often assume that subproblems can be solved exactly at each iteration, which in practical terms usually restricts the quadratic term in the subproblem to be diagonal, thus losing most of the benefits of higher-order derivative information. Moreover, in contrast to the smooth case, non-uniform sampling of the blocks has not yet been shown to improve the convergence rate bounds for regularized problems. This work proposes an inexact randomized block-coordinate descent method based on a regularized quadratic subproblem, in which the quadratic term can vary from iteration to iteration: a “variable metric.” We provide a detailed convergence analysis for both convex and non-convex problems. Our analysis generalizes, to the regularized case, Nesterov’s proposal for improving convergence of block-coordinate descent by sampling proportional to the blockwise Lipschitz constants. We improve the convergence rate in the convex case by weakening the dependency on the initial objective value. Empirical results also show that significant benefits accrue from the use of a variable metric.

MSC:

90C30 Nonlinear programming
90C06 Large-scale problems in mathematical programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 68, 1, 49-67 (2006) · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[2] Meier, L.; Van De Geer, S.; Bühlmann, P., The group LASSO for logistic regression, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 70, 1, 53-71 (2008) · Zbl 1400.62276 · doi:10.1111/j.1467-9868.2007.00627.x
[3] Crammer, K.; Singer, Y., On the learnability and design of output codes for multiclass problems, Mach. Learn., 47, 2-3, 201-233 (2002) · Zbl 1012.68155 · doi:10.1023/A:1013637720281
[4] Lebanon, G., Lafferty, J.D.: Boosting and maximum likelihood for exponential models. In: Advances in Neural Information Processing Systems, pp. 447-454 (2002)
[5] Tsochantaridis, I.; Joachims, T.; Hofmann, T.; Altun, Y., Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res., 6, Sep, 1453-1484 (2005) · Zbl 1222.68321
[6] Lee, C.; Lin, CJ, A study on L2-loss (squared hinge-loss) multi-class SVM, Neural Comput., 25, 5, 1302-1323 (2013) · Zbl 1414.68065 · doi:10.1162/NECO_a_00434
[7] Nesterov, Y., Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22, 2, 341-362 (2012) · Zbl 1257.90073 · doi:10.1137/100802001
[8] Tappenden, R.; Richtárik, P.; Gondzio, J., Inexact coordinate descent: complexity and preconditioning, J. Optim. Theory Appl., 170, 1, 144-176 (2016) · Zbl 1350.65062 · doi:10.1007/s10957-016-0867-4
[9] Fountoulakis, K.; Tappenden, R., A flexible coordinate descent method, Comput. Optim. Appl., 70, 2, 351-394 (2018) · Zbl 1391.90410 · doi:10.1007/s10589-018-9984-3
[10] Chouzenoux, E.; Pesquet, JC; Repetti, A., A block coordinate variable metric forward-backward algorithm, J. Global Optim., 66, 3, 457-485 (2016) · Zbl 1351.90128 · doi:10.1007/s10898-016-0405-9
[11] Sun, R., Hong, M.: Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. In: Advances in Neural Information Processing Systems, pp. 1306-1314 (2015)
[12] Tseng, P.; Yun, S., A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117, 1, 387-423 (2009) · Zbl 1166.90016 · doi:10.1007/s10107-007-0170-0
[13] Yun, S., On the iteration complexity of cyclic coordinate gradient descent methods, SIAM J. Optim., 24, 3, 1567-1580 (2014) · Zbl 1305.49037 · doi:10.1137/130937755
[14] Sun, R.; Ye, Y., Worst-case complexity of cyclic coordinate descent: \({O}(n^2)\) gap with randomized version, Math. Program. (2019) · Zbl 1459.65044 · doi:10.1007/s10107-019-01437-5
[15] Bonettini, S.; Loris, I.; Porta, F.; Prato, M., Variable metric inexact line-search-based methods for nonsmooth optimization, SIAM J. Optim., 26, 2, 891-921 (2016) · Zbl 1338.65157 · doi:10.1137/15M1019325
[16] Lee, C.; Wright, SJ, Inexact successive quadratic approximation for regularized optimization, Comput. Optim. Appl., 72, 641-674 (2019) · Zbl 1420.90045 · doi:10.1007/s10589-019-00059-z
[17] Hiriart-Urruty, JB; Strodiot, JJ; Nguyen, VH, Generalized hessian matrix and second-order optimality conditions for problems with \({C}^{1,1}\) data, Appl. Math. Optim., 11, 1, 43-56 (1984) · Zbl 0542.49011 · doi:10.1007/BF01442169
[18] Lee, C.; Wright, SJ, Random permutations fix a worst case for cyclic coordinate descent, IMA J. Numer. Anal., 39, 3, 1246-1275 (2019) · Zbl 1464.90058 · doi:10.1093/imanum/dry040
[19] Wright, S.J., Lee, C.: Analyzing random permutations for cyclic coordinate descent. Math. Comput. (2020). (To appear) · Zbl 1442.65049
[20] Patrascu, A.; Necoara, I., Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, J. Global Optim., 61, 1, 19-46 (2015) · Zbl 1335.90074 · doi:10.1007/s10898-014-0151-9
[21] Wright, SJ, Accelerated block-coordinate relaxation for regularized optimization, SIAM J. Optim., 22, 1, 159-186 (2012) · Zbl 1357.49105 · doi:10.1137/100808563
[22] Li, X.; Zhao, T.; Arora, R.; Liu, H.; Hong, M., On faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization, J. Mach. Learn. Res., 18, 1, 6741-6764 (2017) · Zbl 1404.68112
[23] Scheinberg, K.; Tang, X., Practical inexact proximal quasi-Newton method with global complexity analysis, Math. Program., 160, 1-2, 495-529 (2016) · Zbl 1366.90166 · doi:10.1007/s10107-016-0997-3
[24] Ghanbari, H.; Scheinberg, K., Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates, Comput. Optim. Appl., 69, 3, 597-627 (2018) · Zbl 1397.90301 · doi:10.1007/s10589-017-9964-z
[25] Peng, W., Zhang, H., Zhang, X.: Global complexity analysis of inexact successive quadratic approximation methods for regularized optimization under mild assumptions. Tech. rep. (2018) · Zbl 1450.90058
[26] Nutini, J., Schmidt, M., Laradji, I., Friedlander, M., Koepke, H.: Coordinate descent converges faster with the gauss-southwell rule than random selection. In: International Conference on Machine Learning, pp. 1632-1641 (2015)
[27] Nutini, J., Laradji, I., Schmidt, M.: Let’s make block coordinate descent go fast: Faster greedy rules, message-passing, active-set complexity, and superlinear convergence. Tech. rep. (2017). arXiv:1712.08859
[28] Lu, Z.; Xiao, L., On the complexity analysis of randomized block-coordinate descent methods, Math. Program., 152, 1-2, 615-642 (2015) · Zbl 1321.65100 · doi:10.1007/s10107-014-0800-2
[29] Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
[30] Lee, C., Wright, S.J.: First-order algorithms converge faster than \({O}(1/k)\) on convex problems. In: Proceedings of the 36th International Conference on Machine Learning (2019)
[31] He, X.; Tappenden, R.; Takac, M., Dual free adaptive minibatch sdca for empirical risk minimization, Front. Appl. Math. Stat., 4, 33 (2018) · doi:10.3389/fams.2018.00033
[32] Walker, AJ, An efficient method for generating discrete random variables with general distributions, ACM Trans. Math. Softw., 3, 3, 253-256 (1977) · Zbl 1148.65301 · doi:10.1145/355744.355749
[33] Tibshirani, R., Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. Ser. B, 58, 267-288 (1996) · Zbl 0850.62538
[34] Wright, SJ; Nowak, RD; Figueiredo, MAT, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57, 7, 2479-2493 (2009) · Zbl 1391.94442 · doi:10.1109/TSP.2009.2016892
[35] Beck, A.; Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 1, 183-202 (2009) · Zbl 1175.94009 · doi:10.1137/080716542
[36] Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Convex until proven guilty: dimension-free acceleration of gradient descent on non-convex functions. In: International Conference on Machine Learning, pp. 654-663. JMLR.org (2017)
[37] Beaton, AE; Tukey, JW, The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data, Technometrics, 16, 2, 147-185 (1974) · Zbl 0282.62057 · doi:10.1080/00401706.1974.10489171
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.