zbMATH — the first resource for mathematics

Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. (English) Zbl 1342.90103
A minimization problem related to the machine learning is considered. The objective function is a regularized loss function obtained by adding a regularizer to the convex loss function. A new version of the stochastic dual coordinate ascent method is proposed and its high convergence rate is proven. This result enables improvement of the key machine learning algorithms including support vector machines (SVM), ridge regression, Lasso, and multiclass SVM. The experimental results are included which corroborate the theoretical findings.

90C06 Large-scale problems in mathematical programming
90C15 Stochastic programming
90C25 Convex programming
Full Text: DOI
[1] Baes, M.: Estimate Sequence Methods: Extensions and Approximations. Institute for Operations Research, ETH, Zürich (2009)
[2] Beck, A; Teboulle, M, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2, 183-202, (2009) · Zbl 1175.94009
[3] Collins, M; Globerson, A; Koo, T; Carreras, X; Bartlett, P, Exponentiated gradient algorithms for conditional random fields and MAX-margin Markov networks, J. Mach. Learn. Res., 9, 1775-1822, (2008) · Zbl 1225.68167
[4] Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. arXiv preprint arXiv:1106.4574 (2011) · Zbl 1235.62151
[5] Crammer, K; Singer, Y, On the algorithmic implementation of multiclass kernel-based vector machines, J. Mach. Learn. Res., 2, 265-292, (2001) · Zbl 1037.68110
[6] d’Aspremont, A, Smooth optimization with approximate gradient, SIAM J. Optim., 19, 1171-1183, (2008) · Zbl 1180.90378
[7] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1-2), 37-75 (2014) · Zbl 1317.90196
[8] Duchi, J; Singer, Y, Efficient online and batch learning using forward backward splitting, J. Mach. Learn. Res., 10, 2899-2934, (2009) · Zbl 1235.62151
[9] Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the l 1-ball for learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, pp. 272-279. ACM (2008)
[10] Duchi, J., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the 23rd Annual Conference on Learning Theory, pp. 14-26 (2010)
[11] Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. Technical report. arXiv:1312.5799 (2013) · Zbl 1280.62081
[12] Ghadimi, S; Lan, G, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework, SIAM J. Optim., 22, 1469-1492, (2012) · Zbl 1301.62077
[13] Hu, C., Weike, P., Kwok, J.T.: Accelerated gradient methods for stochastic optimization and online learning. In: Advances in Neural Information Processing Systems, pp. 781-789 (2009) · Zbl 1301.62077
[14] Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Stochastic block-coordinate frank-wolfe optimization for structural svms. arXiv preprint arXiv:1207.4747 (2012) · Zbl 1235.62151
[15] Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. In: NIPS, pp. 905-912 (2009) · Zbl 1235.68167
[16] Roux, N.L., Schmidt, M., Bach, F.: A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258 (2012) · Zbl 1257.90073
[17] Nesterov, Y, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22, 341-362, (2012) · Zbl 1257.90073
[18] Nesterov, Y, Smooth minimization of non-smooth functions, Math. Program., 103, 127-152, (2005) · Zbl 1079.90102
[19] Nesterov, Y, Gradient methods for minimizing composite objective function, Math. Program., 140, 125-161, (2013) · Zbl 1287.90067
[20] Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1-2), 1-38 (2014)
[21] Schmidt, M., Roux, N.L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. Technical report. arXiv:1109.2415 (2011) · Zbl 1079.90102
[22] Shalev-Shwartz, S; Tewari, A, Stochastic methods for l 1-regularized loss minimization, J. Mach. Learn. Res., 12, 1865-1892, (2011) · Zbl 1280.62081
[23] Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014) · Zbl 1305.68005
[24] Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l 1-regularized loss minimization. In: ICML, p. 117 (2009) · Zbl 1280.62081
[25] Shalev-Shwartz, S., Zhang, T.: Proximal stochastic dual coordinate ascent. arXiv:1211.2717 (2012) · Zbl 1342.90103
[26] Shalev-Shwartz, S; Zhang, T, Stochastic dual coordinate ascent methods for regularized loss minimization, J. Mach. Learn. Res., 14, 567-599, (2013) · Zbl 1307.68073
[27] Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-GrAdient SOlver for SVM. In: ICML, pp. 807-814 (2007) · Zbl 1211.90239
[28] Shalev-Shwartz, S; Srebro, N; Zhang, T, Trading accuracy for sparsity in optimization problems with sparsity constraints, SIAM J. Optim., 20, 2807-2832, (2010) · Zbl 1208.68226
[29] Takác, M., Bijral, A., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In: ICML (2013)
[30] Xiao, L, Dual averaging method for regularized stochastic learning and online optimization, J. Mach. Learn. Res., 11, 2543-2596, (2010) · Zbl 1242.62011
[31] Zhang, T, On the dual formulation of regularized linear systems, Mach. Learn., 46, 91-129, (2002) · Zbl 0998.68100
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.