zbMATH — the first resource for mathematics

Block stochastic gradient iteration for convex and nonconvex optimization. (English) Zbl 1342.93125

93E25 Computational methods in stochastic control (MSC2010)
93E20 Optimal stochastic control
49M05 Numerical methods based on necessary conditions
90C15 Stochastic programming
90C25 Convex programming
90C26 Nonconvex programming, global optimization
90C30 Nonlinear programming
65K05 Numerical mathematical programming methods
65K10 Numerical optimization and variational techniques
65B99 Acceleration of convergence in numerical analysis
Full Text: DOI arXiv
[1] A. Auslender, Asymptotic properties of the Fenchel dual functional and applications to decomposition problems, J. Optim. Theory Appl., 73 (1992), pp. 427–449. · Zbl 0794.49026
[2] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183–202. · Zbl 1175.94009
[3] A. Beck and L. Tetruashvili, On the convergence of block coordinate descent type methods, SIAM J. Optim., 23 (2013), pp. 2037–2060. · Zbl 1297.90113
[4] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, MA, 1999.
[5] S. Boyd, L. Xiao, and A. Mutapcic, Subgradient Methods, Lecture Notes of EE392o, Autumn Quarter, 2004, Stanford University, Palo Alto, CA, 2003.
[6] E. J. Candes and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (2009), pp. 717–772. · Zbl 1219.90124
[7] K.-W. Chang, C.-J. Hsieh, and C.-J. Lin, Coordinate descent method for large-scale \({L}2\)-loss linear support vector machines, J. Mach. Learn. Res., 9 (2008), pp. 1369–1398. · Zbl 1225.68157
[8] K.-L. Chung, On a stochastic approximation method, Ann. Math. Statist., 25 (1954), pp. 463–483. · Zbl 0059.13203
[9] C. D. Dang and G. Lan, Stochastic block mirror descent methods for nonsmooth and stochastic optimization, SIAM J. Optim., 25 (2015), pp. 856–881. · Zbl 1353.90095
[10] M. Dyrholm, C. Christoforou, and L. C. Parra, Bilinear discriminant component analysis, J. Mach. Learn. Res., 8 (2007), pp. 1097–1111.
[11] J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program., 55 (1992), pp. 293–318. · Zbl 0765.90073
[12] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, Liblinear: A library for large linear classification, J. Mach. Learn. Res., 9 (2008), pp. 1871–1874. · Zbl 1225.68175
[13] M. P. Friedlander and M. Schmidt, Hybrid deterministic-stochastic methods for data fitting, SIAM J. Sci. Comput., 34 (2012), pp. A1380–A1405. · Zbl 1262.90090
[14] R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2011, pp. 69–77.
[15] S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program. (2015), DOI:10.1007/s10107-015-0871-8. · Zbl 1335.62121
[16] S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms, SIAM J. Optim., 23 (2013), pp. 2061–2089. · Zbl 1293.62167
[17] S. Ghadimi and G. Lan, Stochastic first and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), pp. 2341–2368. · Zbl 1295.90026
[18] S. Ghadimi, G. Lan, and H. Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Math. Program. (2015), DOI:10.1007/s10107-014-0846-1. · Zbl 1332.90196
[19] J. L. Goffin, On convergence rates of subgradient optimization methods, Math. Program., 13 (1977), pp. 329–347. · Zbl 0368.90119
[20] L. Grippo and M. Sciandrone, On the convergence of the block nonlinear Gauss-Seidel method under convex constraints, Oper. Res. Lett., 26 (2000), pp. 127–136. · Zbl 0955.90128
[21] L. Grippo and M. Sciandrone, On the convergence of the block nonlinear Gauss-Seidel method under convex constraints, Oper. Res. Lett., 26 (2000), pp. 127–136. · Zbl 0955.90128
[22] C. Hildreth, A quadratic programming procedure, Naval Res. Logist. Q., 4 (1957), pp. 79–85.
[23] M. Hong, X. Wang, M. Razaviyayn, and Z.-Q. Luo, Iteration Complexity Analysis of Block Coordinate Descent Methods, preprint, arXiv:1310.6957, 2013.
[24] A. J. Kleywegt, A. Shapiro, and T. Homem-de-Mello, The sample average approximation method for stochastic discrete optimization, SIAM J. Optim., 12 (2002), pp. 479–502. · Zbl 0991.90090
[25] G. Lan, An optimal method for stochastic composite optimization, Math. Program., 133 (2012), pp. 365–397. · Zbl 1273.90136
[26] J. Liu, S. J. Wright, and S. Sridhar, An Asynchronous Parallel Randomized Kaczmarz Algorithm, preprint, arXiv:1401.4780, 2014.
[27] Z. Lu and L. Xiao, On the complexity analysis of randomized block-coordinate descent methods, Math. Program., 152 (2015), pp. 615–642. · Zbl 1321.65100
[28] Z. Lu and L. Xiao, Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming, preprint, arXiv:1306.5918, 2013.
[29] Z.-Q. Luo and P. Tseng, On the convergence of the coordinate descent method for convex differentiable minimization, J. Optim. Theory Appl., 72 (1992), pp. 7–35. · Zbl 0795.90069
[30] J. Mairal, Stochastic majorization-minimization algorithms for large-scale optimization, in Advances in Neural Information Processing Systems 26, 2013.
[31] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding, in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, 2009, pp. 689–696. · Zbl 1242.62087
[32] C. Navasca, L. De Lathauwer, and S. Kindermann, Swamp reducing technique for tensor decomposition, in Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008), IEEE, Piscataway, NJ, 2008.
[33] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2009), pp. 1574–1609. · Zbl 1189.90109
[34] A. Nemirovski and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, Chichester, 1983.
[35] Y. Nesterov, Introductory Lectures on Convex Optimization, Appl. Optim. 87, 2004. · Zbl 1086.90045
[36] Y. Nesterov, Primal-dual subgradient methods for convex problems, Math. Program., 120 (2009), pp. 221–259. · Zbl 1191.90038
[37] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), pp. 341–362. · Zbl 1257.90073
[38] Y. Nesterov and V. Shikhman, Convergent Subgradient Methods for Nonsmooth Convex Minimization, Technical report, Université Catholique de Louvain, Center for Operations Research and Econometrics (CORE), Louvain-la-neuve, Belgium, 2014. · Zbl 1330.90078
[39] Z. Peng, M. Yan, and W. Yin, Parallel and distributed sparse optimization, in Proceedings of the 2013 Asilomar Conference on Signals, Systems, and Computers, IEEE, Piscataway, NJ, 2013, pp. 659–646.
[40] B. T. Polyak, New stochastic approximation type procedures, Avtomat. i Telemekh., 51 (1990), pp. 98–107.
[41] M. Razaviyayn, M. Hong, and Z.-Q. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23 (2013), pp. 1126–1153. · Zbl 1273.90123
[42] B. Recht and C. Ré, Parallel stochastic gradient algorithms for large-scale matrix completion, Math. Program. Comput., 5 (2013), pp. 201–226. · Zbl 1275.90039
[43] P. Richtárik and M. Takáč, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Math. Program., 144 (2014), pp. 1–38. · Zbl 1301.65051
[44] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), pp. 400–407. · Zbl 0054.05901
[45] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877–898. · Zbl 0358.90053
[46] R. T. Rockafellar and R. J. B. Wets, Variational Analysis, Grundkhren Math. Wiss. 317, Springer-Verlag, Berlin, 1998.
[47] J. Sacks, Asymptotic distribution of stochastic approximation procedures, Ann. Math. Statist., 29 (1958), pp. 373–405. · Zbl 0229.62010
[48] A. Saha and A. Tewari, On the nonasymptotic convergence of cyclic coordinate descent methods, SIAM J. Optim., 23 (2013), pp. 576–601. · Zbl 1270.90032
[49] T. Schaul, S. Zhang, and Y. Lecun, No more pesky learning rates, in Proceedings of the 30th International Conference on Machine Learning (ICML-13), ACM, New York, 2013, pp. 343–351.
[50] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, Pegasos: Primal estimated sub-gradient solver for SVM, Math. Program., 127 (2011), pp. 3–30. · Zbl 1211.90239
[51] S. Shalev-Shwartz and A. Tewari, Stochastic methods for l1 regularized loss minimization, in ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, 2009, pp. 929–936.
[52] S. K. Shevade and S. S. Keerthi, A simple and efficient algorithm for gene selection using sparse logistic regression, Bioinformatics, 19 (2003), pp. 2246–2253.
[53] J. V. Shi, Y. Xu, and R. G. Baraniuk, Sparse Bilinear Logistic Regression, preprint, arXiv:1404.4104, 2014.
[54] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Methodol., 58 (1996), pp. 267–288. · Zbl 0850.62538
[55] P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), pp. 475–494. · Zbl 1006.65062
[56] P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), pp. 387–423. · Zbl 1166.90016
[57] Z. Wen, D. Goldfarb, and K. Scheinberg, Block coordinate descent methods for semidefinite programming, in Handbook on Semidefinite, Conic and Polynomial Optimization, Springer, New York, 2012, pp. 533–564.f̌ill · Zbl 1334.90118
[58] Y. Xu and W. Yin, A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sci., 6 (2013), pp. 1758–1789. · Zbl 1280.49042
[59] Y. Xu and W. Yin, A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update, preprint, arXiv:1410.1386, 2014.
[60] T. Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms, in Proceedings of the Twenty-first International Conference on Machine Learning, ACM, New York, 2004, 116.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.