×

zbMATH — the first resource for mathematics

Incremental majorization-minimization optimization with application to large-scale machine learning. (English) Zbl 1320.90047

MSC:
90C06 Large-scale problems in mathematical programming
68T05 Learning and adaptive systems in artificial intelligence
90C26 Nonconvex programming, global optimization
90C25 Convex programming
PDF BibTeX Cite
Full Text: DOI
References:
[1] S. Ahn, J. A. Fessler, D. Blatt, and A. O. Hero, Convergent incremental optimization transfer algorithms: Application to tomography, IEEE Trans. Med. Imaging, 25 (2006), pp. 283–296.
[2] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties, Found. Trends Mach. Learn., 4 (2012), pp. 1–106. · Zbl 06064248
[3] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183–202. · Zbl 1175.94009
[4] A. Beck and L. Tetruashvili, On the convergence of block coordinate descent type methods, SIAM J. Optim., 23 (2013), pp. 2037–2060. · Zbl 1297.90113
[5] D. P. Bertsekas, Nonlinear Programming, 2nd ed., Athena Scientific, Belmont, MA, 1999.
[6] D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim., 18 (2007), pp. 29–51. · Zbl 1154.90015
[7] D. Böhning and B. G. Lindsay, Monotonicity of quadratic-approximation algorithms, Ann. Inst. Statist. Math., 40 (1988), pp. 641–663. · Zbl 0723.65150
[8] J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer, New York, 2006. · Zbl 1116.90001
[9] L. Bottou, Online algorithms and stochastic approximations, in Online Learning and Neural Networks, D. Saad, ed., Cambridge University Press, Cambridge, UK, 1998. · Zbl 0968.68127
[10] S. P. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004. · Zbl 1058.90049
[11] E. J. Candès, M. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted \(ℓ_1\) minimization, J. Fourier Anal. Appl., 14 (2008), pp. 877–905. · Zbl 1176.94014
[12] A. Choromanska and T. Jebara, Stochastic Bound Majorization, arXiv:1309.5605, 2013.
[13] M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, AdaBoost and Bregman distances, Mach. Learn., 48 (2002), pp. 253–285. · Zbl 0998.68123
[14] P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, New York, 2010.
[15] P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., 4 (2005), pp. 1168–1200. · Zbl 1179.94031
[16] I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math., 57 (2004), pp. 1413–1457. · Zbl 1077.65055
[17] A. J. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Proceedings of Advances in Neural Information Processing Systems, 2014.
[18] A. J. Defazio, T. S. Caetano, and J. Domke, Finito: A faster, permutable incremental gradient method for big data problems, in Proceedings of ICML, 2014.
[19] S. Della Pietra, V. Della Pietra, and J. Lafferty, Duality and Auxiliary Functions for Bregman Distances, Tech. report, CMU-CS-01-109, Carnegie Mellon University, Pittsburgh, 2001.
[20] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B, 39 (1977), pp. 1–38. · Zbl 0364.62022
[21] J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12 (2011), pp. 2121–2159. · Zbl 1280.68164
[22] J. Duchi and Y. Singer, Efficient online and batch learning using forward backward splitting, J. Mach. Learn. Res., 10 (2009), pp. 2899–2934. · Zbl 1235.62151
[23] H. Erdogan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Phys. Med. Biol., 44 (1999), pp. 2835–2851.
[24] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res., 9 (2008), pp. 1871–1874. · Zbl 1225.68175
[25] M. Fashing and C. Tomasi, Mean shift is a bound optimization, IEEE Trans. Pattern Anal., 27 (2005), pp. 471–474.
[26] G. Gasso, A. Rakotomamonjy, and S. Canu, Recovering sparse signals with non-convex penalties and DC programming, IEEE Trans. Signal Process., 57 (2009), pp. 4686–4698. · Zbl 1391.90489
[27] S. Ghadimi and G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework, SIAM J. Optim., 22 (2012), pp. 1469–1492. · Zbl 1301.62077
[28] E. T. Hale, W. Yin, and Y. Zhang, Fixed-point continuation for \(ℓ_1\)-minimization: Methodology and convergence, SIAM J. Optim., 19 (2008), pp. 1107–1130. · Zbl 1180.65076
[29] E. Hazan and S. Kale, Beyond the regret minimization barrier: An optimal algorithm for stochastic strongly-convex optimization, in Proceedings of COLT, 2011. · Zbl 1319.90050
[30] R. Horst and N. V. Thoai, DC programming: Overview, J. Optim. Theory Appl., 103 (1999), pp. 1–43.
[31] T. Jebara and A. Choromanska, Majorization for CRFs and latent likelihoods, in Proceedings of Advances in Neural Information Processing Systems, 2012.
[32] A. Juditsky and A. Nemirovski, First order methods for nonsmooth convex large-scale optimization, in Optimization for Machine Learning, MIT Press, Cambridge, MA, 2011. · Zbl 1365.94077
[33] E. Khan, B. Marlin, G. Bouchard, and K. Murphy, Variational bounds for mixed-data factor analysis, in Proceedings of Advances in Neural Information Processing Systems, 2010.
[34] G. Lan, An optimal method for stochastic composite optimization, Math. Program., 133 (2012), pp. 365–397. · Zbl 1273.90136
[35] K. Lange, D. R. Hunter, and I. Yang, Optimization transfer using surrogate objective functions, J. Comput. Graph. Statist., 9 (2000), pp. 1–20.
[36] N. Le Roux, M. Schmidt, and F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets, in Proceedings of Advances in Neural Information Processing Systems, 2012.
[37] D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Proceedings of Advances in Neural Information Processing Systems, 2001.
[38] J. Mairal, Optimization with first-order surrogate functions, in Proceedings of ICML, 2013.
[39] J. Mairal, Stochastic majorization-minimization algorithms for large-scale optimization, in Proceedings of Advances in Neural Information Processing Systems, 2013.
[40] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res., 11 (2010), pp. 19–60. · Zbl 1242.62087
[41] J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, C. R. Acad. Sci. Paris Sér. A Math., 255 (1962), pp. 2897–2899. · Zbl 0118.10502
[42] R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models, Kluwer, Dordrecht, the Netherlands, 1998, pp. 355–368. · Zbl 0916.62019
[43] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2009), pp. 1574–1609. · Zbl 1189.90109
[44] Y. Nesterov, Introductory Lectures on Convex Optimization, Kluwer, Dordrecht, the Netherlands, 2004. · Zbl 1086.90045
[45] Y. Nesterov, Gradient methods for minimizing composite objective functions, Math. Program., 140 (2012), pp. 125–161. · Zbl 1287.90067
[46] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006. · Zbl 1104.65059
[47] M. Razaviyayn, M. Hong, and Z.-Q. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23 (2013), pp. 1126–1153. · Zbl 1273.90123
[48] M. Razaviyayn, M. Sanjabi, and Z.-Q. Luo, A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization, arXiv:1307.4457v2, 2013. · Zbl 1357.90101
[49] M. Schmidt, N. Le Roux, and F. Bach, Convergence rates of inexact proximal-gradient methods for convex optimization, in Proceedings of Advances in Neural Information Processing Systems, 2011.
[50] M. Schmidt, N. Le Roux, and F. Bach, Minimizing Finite Sums with the Stochastic Average Gradient, arXiv:1309.2388, 2013. · Zbl 1358.90073
[51] S. Shalev-Schwartz and T. Zhang, Proximal Stochastic Dual Coordinate Ascent, arXiv:1211.2717, 2012.
[52] B. A. Turlach, W. N. Venables, and S. J. Wright, Simultaneous variable selection, Technometrics, 47 (2005), pp. 349–363.
[53] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference, Found. Trends Mach. Learn., 1 (2008), pp. 1–305. · Zbl 1193.62107
[54] S. J. Wright, R. D. Nowak, and M. A. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57 (2009), pp. 2479–2493. · Zbl 1391.94442
[55] L. Xiao, Dual averaging methods for regularized stochastic learning and online optimization, J. Mach. Learn. Res., 11 (2010), pp. 2543–2596. · Zbl 1242.62011
[56] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables., J. Roy. Statist. Soc. Ser. B, 68 (2006), pp. 49–67. · Zbl 1141.62030
[57] L. W. Zhong and J. T. Kwok, Fast stochastic alternating direction method of multipliers, in Proceedings of ICML, 2014.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.