×

PNKH-B: a projected Newton-Krylov method for large-scale bound-constrained optimization. (English) Zbl 07418126

Summary: We present PNKH-B, a projected Newton-Krylov method for iteratively solving large-scale optimization problems with bound constraints. PNKH-B is geared toward situations in which function and gradient evaluations are expensive, and the (approximate) Hessian is only available through matrix-vector products. This is commonly the case in large-scale parameter estimation, machine learning, and image processing. In each iteration, PNKH-B uses a low-rank approximation of the (approximate) Hessian to determine the search direction and construct the metric used in a projected line search. The key feature of the metric is its consistency with the low-rank approximation of the Hessian on the Krylov subspace. This renders PNKH-B similar to a projected variable metric method. We present an interior point method to solve the quadratic projection problem efficiently. Since the interior point method effectively exploits the low-rank structure, its computational cost only scales linearly with respect to the number of variables, and it only adds negligible computational time. We also experiment with variants of PNKH-B that incorporate estimates of the active set into the Hessian approximation. We prove the global convergence to a stationary point under standard assumptions. Using three numerical experiments motivated by parameter estimation, machine learning, and image reconstruction, we show that the consistent use of the Hessian metric in PNKH-B leads to fast convergence, particularly in the first few iterations. We provide our MATLAB implementation at https://github.com/EmoryMLIP/PNKH-B.

MSC:

65K10 Numerical optimization and variational techniques
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] A. Beck, Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB, MOS-SIAM Ser. Optim. 19, SIAM, Philadelphia, 2014. · Zbl 1320.90001
[2] S. Becker and J. Fadili, A quasi-Newton proximal splitting method, in Advances in Neural Information Processing Systems, 2012, pp. 2618-2626.
[3] S. Becker, J. Fadili, and P. Ochs, On Quasi-Newton Forward-Backward Splitting: Proximal Calculus and Convergence, preprint, arXiv:1801.08691, 2018. · Zbl 1461.65128
[4] D. P. Bertsekas, Projected Newton methods for optimization problems with simple constraints, SIAM J. Control Optimi., 20 (1982), pp. 221-246. · Zbl 0507.49018
[5] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Control Optimi., 16 (1995), pp. 1190-1208. · Zbl 0836.65080
[6] R. H. Chan, K. K. Kan, M. Nikolova, and R. J. Plemmons, A two-stage method for spectral-spatial classification of hyperspectral images, J. Math. Imaging Vision, (2020), pp. 1-18. · Zbl 1483.68282
[7] B. Chang, L. Meng, E. Haber, L. Ruthotto, D. Begert, and E. Holtham, Reversible architectures for arbitrarily deep residual neural networks, in proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
[8] T. F. Coleman and Y. Li, On the convergence of interior-reflective newton methods for nonlinear minimization subject to bounds, Mathe. Program., 67 (1994), pp. 189-224. · Zbl 0842.90106
[9] T. F. Coleman and Y. Li, An interior trust region approach for nonlinear minimization subject to bounds, SIAM J. Optimi., 6 (1996), pp. 418-445. · Zbl 0855.65063
[10] F. Curtis and J. Nocedal, Steplength selection in interior-point methods for quadratic programming, Appl. Mathe. Lett., 20 (2007), pp. 516-523. · Zbl 1161.90452
[11] A. Dener and T. Munson, Accelerating limited-memory quasi-newton convergence for large-scale optimization, in International Conference on Computational Science, Springer, New York 2019, pp. 495-507.
[12] A. Dey and H. F. Morrison, Resistivity modeling for arbitrarily shaped three-dimensional structures, Geophysics, 44 (1979), pp. 753-780.
[13] J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Machi. Learn. Rese., 12 (2011), pp. 2121-2159. · Zbl 1280.68164
[14] J. C. Dunn, Newton’s method and the Goldstein step-length rule for constrained minimization problems, SIAM J. Control Optimi., 18 (1980), pp. 659-674. · Zbl 0445.90098
[15] M. C. Ferris and T. S. Munson, Interior-point methods for massive support vector machines, SIAM J. Optimi., 13 (2002), pp. 783-804. · Zbl 1039.90092
[16] S. Fine and K. Scheinberg, Efficient svm training using low-rank kernel representations, J. Machi. Learn. Rese., 2 (2001), pp. 243-264. · Zbl 1037.68112
[17] H. P. Flath, L. C. Wilcox, V. Ak\ccelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput., 33 (2011), pp. 407-432. · Zbl 1229.65174
[18] E. M. Gafni and D. P. Bertsekas, Two-metric projection methods for constrained optimization, SIAM J. Control Optimi., 22 (1984), pp. 936-964. · Zbl 0555.90086
[19] J. C. Gilbert and C. Lemaréchal, The Module M1QN3 version 3, INRIA, 2006.
[20] E. Haber, Computational Methods in Geophysical Electromagnetics, Mathe. Indu. (Phila.) 1, SIAM, Philadelphia, PA, 2015. · Zbl 1304.86001
[21] E. Haber, K. Lensink, E. Triester, and L. Ruthotto, IMEXnet: A Forward Stable Deep Neural Network, preprint, arXiv:1903.02639, 2019.
[22] E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004. · Zbl 1426.68236
[23] E. Haber, L. Ruthotto, E. Holtham, and S.-H. Jun, Learning across scales-multiscale methods for convolution neural networks, in proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
[24] W. W. Hager and H. Zhang, A new active set algorithm for box constrained optimization, SIAM J. Optimi., 17 (2006), pp. 526-557. · Zbl 1165.90570
[25] J. L. Herring, J. Nagy, and L. Ruthotto, Gauss-Newton optimization for phase recovery from the bispectrum, IEEE Transa. Comput. Imaging, 6(2020), pp. 235-297.
[26] Y. Hu, Matrix Computations and Optimization for Spectral Computed Tomography, Ph.D. thesis, Emory University, 2019.
[27] Y. Hu, M. S. Andersen, and J. G. Nagy, Spectral computed tomography with linearization and preconditioning, SIAM J. Sci. Comput., 41 (2019), pp. S370-S389. · Zbl 07124597
[28] Y. Hu, J. G. Nagy, J. Zhang, and M. S. Andersen, Nonlinear optimization for mixed attenuation polyenergetic image reconstruction, Inverse Problems, 35 (2019), 064004. · Zbl 1416.92101
[29] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learning machine: Theory and Applications, Neurocomputing, 70 (2006), pp. 489-501.
[30] C. T. Kelley, Iterative Methods for Optimization, SIAM, Philadelphia, 1999. · Zbl 0934.90082
[31] D. Kim, S. Sra, and I. S. Dhillon, Tackling box-constrained optimization via a new projected quasi-Newton approach, SIAM J. Control Optimi., 32 (2010), pp. 3548-3563. · Zbl 1220.93085
[32] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J. Rese. National Bureau of Standards, 45 (1950), pp. 255-282.
[33] G. Landi and E. L. Piccolomini, A projected Newton-CG method for nonnegative astronomical image deblurring, Numeri. Algorithms, 48 (2008), pp. 279-300. · Zbl 1151.65053
[34] G. Landi and E. L. Piccolomini, An improved Newton projection method for nonnegative deblurring of Poisson-corrupted images with Tikhonov regularization, Numeri. Algorithms, 60 (2012), pp. 169-188. · Zbl 1241.65059
[35] G. Landi and E. L. Piccolomini, NPtool: A MATLAB software for nonnegative image restoration with Newton projection methods, Numeri. Algorithms, 62 (2013), pp. 487-504. · Zbl 1262.65033
[36] Y. LeCun, The MNIST Database of Handwritten Digits, http://yann.lecun.com/exdb/mnist/, 1998.
[37] J. D. Lee, Y. Sun, and M. A. Saunders, Proximal Newton-type methods for minimizing composite functions, SIAM J. Optimi., 24 (2014), pp. 1420-1443. · Zbl 1306.65213
[38] P. R. McGillivray, Forward Modeling and Inversion of DC Resistivity and MMR Data, Ph.D. thesis, University of British Columbia, 1992.
[39] J. L. Morales and J. Nocedal, Remark on “Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization,ACM Transa. Mathe. Software, 38 (2011), pp. 1-4. · Zbl 1365.65164
[40] J. J. Moré and G. Toraldo, On the solution of large quadratic programming problems with bound constraints, SIAM J. Optimi., 1 (1991), pp. 93-113. · Zbl 0752.90053
[41] J. Nocedal and S. Wright, Numerical Optimization, Springer, New York, 2006. · Zbl 1104.65059
[42] L. Ruthotto and E. Haber, Deep neural networks motivated by partial differential equations, J. Mathe. Imaging Vision, 62 (2020), pp. 352-364. · Zbl 1434.68522
[43] L. Ruthotto, E. Treister, and E. Haber, jInv-a flexible Julia package for PDE parameter estimation, SIAM J. Sci. Comput., 39 (2017), pp. S702-S722. · Zbl 1373.86013
[44] M. Schmidt, E. Berg, M. Friedlander, and K. Murphy, Optimizing costly functions with simple constraints: A limited-memory projected quasi-Newton algorithm, in proceedings of PLMR, 2009, pp. 456-463.
[45] M. Schmidt, D. Kim, and S. Sra, Projected Newton-type methods in machine learning, In Optimization for Machine Learning, MIT Press,Cambridge, MA, 2012, pp. 305-329.
[46] C. Wang, G. Ballard, R. Plemmons, and S. Prasad, Joint 3D localization and classification of space debris using a multispectral rotating point spread function, Appl. Optics, 58 (2019), pp. 8598-8611.
[47] C. Wang, R. Chan, M. Nikolova, R. Plemmons, and S. Prasad, Nonconvex optimization for 3-dimensional point source localization using a rotating point spread function, SIAM J. Imaging Sci., 12 (2019), pp. 259-286. · Zbl 1423.94010
[48] S. Wu Fung, Large-Scale Parameter Estimation in Geophysics and Machine Learning, Ph.D. thesis, Emory University, 2019, https://search.proquest.com/docview/2374443204?accountid=14512.
[49] S. Wu Fung and Z. W. Di, Multigrid optimization for large-scale ptychographic phase retrieval, SIAM J. Imaging Sci., 13 (2020), pp. 214-233. · Zbl 07196102
[50] S. Wu Fung and L. Ruthotto, A multiscale method for model order reduction in PDE parameter estimation, J. Computat. Appl. Mathe., 350 (2019), pp. 19-34. · Zbl 1524.76241
[51] S. Wu Fung and L. Ruthotto, An uncertainty-weighted asynchronous ADMM method for parallel PDE parameter estimation, SIAM J. Sci. Comput., 41 (2019), pp. S129-S148. · Zbl 1428.65067
[52] S. Wu Fung, S. Tyrväinen, L. Ruthotto, and E. Haber, ADMM-softmax: An ADMM approach for multinomial logistic regression, Electron. Transa. Numeri. Anal., 52 (2020), pp. 214-229. · Zbl 07192774
[53] E. K. Yang and J. W. Tolle, A class of methods for solving large, convex quadratic programs subject to box constraints, Math. Program., 51 (1991), pp. 223-228. · Zbl 0737.90046
[54] R. Y. Zhang and J. Lavaei, Modified interior-point method for large-and-sparse low-rank semidefinite programs, in proceeding of the 56th Annual Conference on Decision and Control, IEEE, 2017, pp. 5640-5647.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.