×

An investigation of Newton-sketch and subsampled Newton methods. (English) Zbl 1454.90112

Summary: Sketching, a dimensionality reduction technique, has received much attention in the statistics community. In this paper, we study sketching in the context of Newton’s method for solving finite-sum optimization problems in which the number of variables and data points are both large. We study two forms of sketching that perform dimensionality reduction in data space: Hessian subsampling and randomized Hadamard transformations. Each has its own advantages, and their relative tradeoffs have not been investigated in the optimization literature. Our study focuses on practical versions of the two methods in which the resulting linear systems of equations are solved approximately, at every iteration, using an iterative solver. The advantages of using the conjugate gradient method vs. a stochastic gradient iteration are revealed through a set of numerical experiments, and a complexity analysis of the Hessian subsampling method is presented.

MSC:

90C53 Methods of quasi-Newton type
90C30 Nonlinear programming
65K05 Numerical mathematical programming methods
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agarwal, N.; Bullins, B.; Hazan, E., Second-order stochastic optimization for machine learning in linear time, J. Mach. Learn. Res., 18, 116, 1-40 (2017) · Zbl 1441.90115
[2] Berahas, A.S., Bollapragada, R., and Nocedal, J., An investigation of newton-sketch and subsampled newton methods: Supplementary materials. 2020.
[3] Bollapragada, R.; Byrd, R. H.; Nocedal, J., Exact and inexact subsampled newton methods for optimization, IMA J. Numer. Anal., 39, 2, 545-578 (2018) · Zbl 1462.65077
[4] Bottou, L.; Curtis, F. E.; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev., 60, 2, 223-311 (2018) · Zbl 1397.65085
[5] Drineas, P.; Mahoney, M. W.; Muthukrishnan, S.; Sarlós, T., Faster least squares approximation, Numer. Math., 117, 2, 219-249 (2011) · Zbl 1218.65037
[6] Erdogdu, M.A. and Montanari, A., Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems, Montreal, Canada, Vol. 28, 2015, pp. 3034-3042.
[7] Golub, G. H.; Van Loan, C. F., Matrix Computations (1989), Johns Hopkins University Press: Johns Hopkins University Press, Baltimore, MD · Zbl 0733.65016
[8] Johnson, R. and Zhang, T., Accelerating stochastic gradient descent using predictive variance reduction. Advances in Neural Information Processing Systems, Lake Tahoe, NV, Vol. 26, 2013, pp. 315-323.
[9] Krahmer, F.; Ward, R., New and improved johnson-lindenstrauss embeddings via the restricted isometry property, SIAM J. Math. Anal., 43, 3, 1269-1281 (2011) · Zbl 1247.15019
[10] Luenberger, D. G., Linear and Nonlinear Programming (1984), Addison-Wesley: Addison-Wesley, New Jersey · Zbl 0571.90051
[11] Luo, H., Agarwal, A., Cesa-Bianchi, N., and Langford, J., Efficient second order online learning by sketching. Advances in Neural Information Processing Systems, Barcelona, Spain, Vol. 29, 2016, pp. 902-910.
[12] Nesterov, Y., Introductory Lectures on Convex Optimization: A Basic Course, Vol. 87 (2013), Springer Science & Business Media, Boston, MA
[13] Nocedal, J.; Wright, S., Numerical Optimization (1999), Springer: Springer, New York · Zbl 0930.65067
[14] Pilanci, M.; Wainwright, M. J., Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence, SIAM. J. Optim., 27, 1, 205-245 (2017) · Zbl 1456.90125
[15] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 400-407 (1951) · Zbl 0054.05901
[16] Roosta-Khorasani, F.; Mahoney, M. W., Sub-sampled newton methods, Math. Program., 174, 1-2, 293-326 (2019) · Zbl 1412.49059
[17] Sra, S.; Nowozin, S.; Wright, S. J., Optimization for Machine Learning (2012), MIT Press, Cambridge, MA
[18] Wang, J.; Lee, J. D.; Mahdavi, M.; Kolar, M.; Srebro, N., Sketching meets random projection in the dual: A provable recovery algorithm for big and high-dimensional data, Electron. J. Stat., 11, 2, 4896-4944 (2017) · Zbl 1470.62068
[19] Wang, S.; Gittens, A.; Mahoney, M. W., Sketched ridge regression: Optimization perspective, statistical perspective, and model averaging, J. Mach. Learn. Res., 18, 218, 1-50 (2018) · Zbl 1473.62253
[20] Woodruff, D. P., Sketching as a tool for numerical linear algebra, Found. Trends \(####\) Theor. Comput. Sci., 10, 1-2, 1-157 (2014) · Zbl 1316.65046
[21] Wright, S. J., Coordinate descent algorithms, Math. Program., 151, 1, 3-34 (2015) · Zbl 1317.49038
[22] Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., and Mahoney, M.W., Sub-sampled newton methods with non-uniform sampling. Advances in Neural Information Processing Systems, Barcelona, Spain, Vol. 29, 2016, pp. 3000-3008.
[23] Xu, P., Roosta-Khorasan, F., and Mahoney, M.W., Newton-type methods for non-convex optimization under inexact hessian information, preprint (2017). Available at arXiv:1708.07164.
[24] Xu, P., Roosta-Khorasan, F., and Mahoney, M.W., Second-order optimization for non-convex machine learning: An empirical study, preprint (2017). Available at arXiv:1708.07827.
[25] Yao, Z., Xu, P., Roosta-Khorasani, F., and Mahoney, M.W., Inexact non-convex newton-type methods, preprint (2018). Available at arXiv:1802.06925.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.