×

Accelerating block coordinate descent methods with identification strategies. (English) Zbl 1414.90327

Summary: This work is about active set identification strategies aimed at accelerating block-coordinate descent methods (BCDM) applied to large-scale problems. We start by devising an identification function tailored for bound-constrained composite minimization together with an associated version of the BCDM, called Active BCDM, that is also globally convergent. The identification function gives rise to an efficient practical strategy for Lasso and \(\ell _1\)-regularized logistic regression. The computational performance of Active BCDM is contextualized using comparative sets of experiments that are based on the solution of problems with data from deterministic instances from the literature. These results have been compared with those of well-established and state-of-the-art methods that are particularly suited for the classes of applications under consideration. Active BCDM has proved useful in achieving fast results due to its identification strategy. Besides that, an extra second-order step was used, with favorable cost-benefit.

MSC:

90C30 Nonlinear programming
60K05 Renewal theory
49M37 Numerical methods based on nonlinear programming
90C06 Large-scale problems in mathematical programming
90C25 Convex programming
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Andrew, G., Gao, J.: Scalable training of L1-regularized log-linear models. In: Proceedings of the 24th international conference on machine learning, ICML ’07, pp. 33-40. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1273496.1273501
[2] Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037-2060 (2013). https://doi.org/10.1137/120887679 · Zbl 1297.90113 · doi:10.1137/120887679
[3] Berg, E.V., Friedlander, M.P., Hennenfent, G., Herrmann, F., Saab, R., Yılmaz, Ö.: SPARCO: a testing framework for sparse reconstruction. Technical Report TR-2007-20, Department of Computer Science, University of British Columbia, Vancouver (2007)
[4] Boisvert, R.F., Pozo, R., Remington, K., Barrett, R.F., Dongarra, J.J.: Matrix market: a web resource for test matrix collections, pp. 125-137. Springer US, Boston, MA (1997). https://doi.org/10.1007/978-1-5041-2940-4_9
[5] Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1-122 (2010) · Zbl 1229.90122 · doi:10.1561/2200000016
[6] Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for \[\ell_1\] ℓ1-regularized loss minimization. In: ICML2011 (ed.) Proceedings of the 28th international conference on machine learning, pp. 1-8. The International Machine Learning Society, Bellevue, Washington, USA (2011). http://www.icml-2011.org/papers/231_icmlpaper.pdf
[7] Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717-772 (2009). https://doi.org/10.1007/s10208-009-9045-5 · Zbl 1219.90124 · doi:10.1007/s10208-009-9045-5
[8] Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
[9] Chen, T., Curtis, F., Robinson, D.: A reduced-space algorithm for minimizing \[\ell_1\] ℓ1-regularized convex functions. SIAM J. Optim. 27(3), 1583-1610 (2017). https://doi.org/10.1137/16M1062259 · Zbl 1369.90103 · doi:10.1137/16M1062259
[10] Csiba, D., Qu, Z., Richtárik, P.: Stochastic dual coordinate ascent with adaptive probabilities. In: Proceedings of the 32nd international conference on international conference on machine learning, ICML’15, vol. 37, pp. 674-683. JMLR.org, Lille, France (2015). http://dl.acm.org/citation.cfm?id=3045118.3045191
[11] Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1-1:25 (2011). https://doi.org/10.1145/2049662.2049663 · Zbl 1365.65123 · doi:10.1145/2049662.2049663
[12] De Santis, M., Lucidi, S., Rinaldi, F.: A fast active set block coordinate descent algorithm for \[\ell_1\] ℓ1-regularized least squares. SIAM J. Optim. 26(1), 781-809 (2016). https://doi.org/10.1137/141000737 · Zbl 1333.65059 · doi:10.1137/141000737
[13] Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201-213 (2002). https://doi.org/10.1007/s101070100263 · Zbl 1049.90004 · doi:10.1007/s101070100263
[14] Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289-1306 (2006). https://doi.org/10.1109/TIT.2006.871582 · Zbl 1288.94016 · doi:10.1109/TIT.2006.871582
[15] Facchinei, F., Fischer, A., Kanzow, C.: On the accurate identification of active constraints. SIAM J. Optim. 9(1), 14-32 (1998). https://doi.org/10.1137/S1052623496305882 · Zbl 0960.90080 · doi:10.1137/S1052623496305882
[16] Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(4), 1997-2013 (2015). https://doi.org/10.1137/130949993 · Zbl 1327.65108 · doi:10.1137/130949993
[17] Fountoulakis, K., Tappenden, R.: A flexible coordinate descent method. Comput. Optim. Appl. 70(2), 351-394 (2018). https://doi.org/10.1007/s10589-018-9984-3 · Zbl 1391.90410 · doi:10.1007/s10589-018-9984-3
[18] Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1-22 (2010). https://doi.org/10.18637/jss.v033.i01 · doi:10.18637/jss.v033.i01
[19] Glasmachers, T., Dogan, U.: Accelerated coordinate descent with adaptive coordinate frequencies. In: Proceedings of the 5th Asian conference on machine learning (ACML), Proc. Mach. Learn. Res., vol. 29, pp. 72-86. PMLR, Australian National University, Canberra, Australia (2013). http://proceedings.mlr.press/v29/Glasmachers13.html
[20] Kim, D., Sra, S., Dhillon, I.S.: A non-monotonic method for large-scale non-negative least squares. Optim. Methods Softw. 28(5), 1012-1039 (2013). https://doi.org/10.1080/10556788.2012.656368 · Zbl 1310.93089 · doi:10.1080/10556788.2012.656368
[21] Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale \[\ell_1\] ℓ1-regularized least squares. IEEE J. Sel. Top. Signal Process. 1(4), 606-617 (2007) · doi:10.1109/JSTSP.2007.910971
[22] Komarek, P.: Paul Komarek’s webpage. http://komarix.org/ac/ds/. Accessed 29 January 2017
[23] Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml. Last updated 23 July 2017. Accessed 01 September 2017
[24] Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press, Cambridge (2012) · Zbl 1318.68003
[25] Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341-362 (2012). https://doi.org/10.1137/100802001 · Zbl 1257.90073 · doi:10.1137/100802001
[26] Ng, A.Y.: Feature selection, \[{L}_1\] L1 vs. \[{L}_2\] L2 regularization and rotational invariance. In: Proceedings of the 21st international conference on machine learning, p. 354 (2004). http://www.machinelearning.org/proceedings/icml2004/papers/354.pdf
[27] Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. 61(1), 19-46 (2015). https://doi.org/10.1007/s10898-014-0151-9 · Zbl 1335.90074 · doi:10.1007/s10898-014-0151-9
[28] Qu, Z., Richtárik, P.: Coordinate descent with arbitrary sampling I: algorithms and complexity. Optim. Methods Softw. 31(5), 829-857 (2016). https://doi.org/10.1080/10556788.2016.1190360 · Zbl 1365.90205 · doi:10.1080/10556788.2016.1190360
[29] Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent methods for huge-scale truss topology design, pp. 27-32. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29210-1_5 · Zbl 1306.74043
[30] Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1), 1-38 (2014). https://doi.org/10.1007/s10107-012-0614-z · Zbl 1301.65051 · doi:10.1007/s10107-012-0614-z
[31] Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1), 433-484 (2016). https://doi.org/10.1007/s10107-015-0901-6 · Zbl 1342.90102 · doi:10.1007/s10107-015-0901-6
[32] Schmidt, M.: Graphical model structure learning with l1-regularization. Ph.D. thesis, University of British Columbia, Vancouver (2010)
[33] Slawski, M.: Problem-specific analysis of non-negative least squares solvers with a focus on instances with sparse solutions (working paper) (2013). https://sites.google.com/site/slawskimartin/publications. Accessed 01 September 2017
[34] Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144-176 (2016). https://doi.org/10.1007/s10957-016-0867-4 · Zbl 1350.65062 · doi:10.1007/s10957-016-0867-4
[35] Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58(1), 267-288 (1996) · Zbl 0850.62538
[36] Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387-423 (2009). https://doi.org/10.1007/s10107-007-0170-0 · Zbl 1166.90016 · doi:10.1007/s10107-007-0170-0
[37] Wen, Z., Yin, W., Zhang, H., Goldfarb, D.: On the convergence of an active-set method for \[\ell_1\] ℓ1 minimization. Optim. Methods Softw. 27(6), 1127-1146 (2012). https://doi.org/10.1080/10556788.2011.591398 · Zbl 1244.49055 · doi:10.1080/10556788.2011.591398
[38] Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3-34 (2015). https://doi.org/10.1007/s10107-015-0892-3 · Zbl 1317.49038 · doi:10.1007/s10107-015-0892-3
[39] Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479-2493 (2009). https://doi.org/10.1109/TSP.2009.2016892 · Zbl 1391.94442 · doi:10.1109/TSP.2009.2016892
[40] Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49-67 (2006). https://doi.org/10.1111/j.1467-9868.2005.00532.x · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.