×

Generalized gradients in dynamic optimization, optimal control, and machine learning problems. (English. Russian original) Zbl 1464.49016

Cybern. Syst. Anal. 56, No. 2, 243-258 (2020); translation from Kibern. Sist. Anal. 2020, No. 2, 89-107 (2020).
Summary: Problems of nonsmooth nonconvex dynamic optimization, optimal control (in discrete time), including feedback control, and machine learning are considered from a common point of view. An analogy between controlling discrete dynamical systems and multilayer neural network learning problems with nonsmooth objective functionals and connections is traced. Methods for computing generalized gradients for such systems based on the Hamilton-Pontryagin functions are developed. Gradient (stochastic) algorithms for optimal control and learning are extended to nonconvex nonsmooth dynamic systems.

MSC:

49K21 Optimality conditions for problems involving relations other than differential equations
49J52 Nonsmooth analysis
68T05 Learning and adaptive systems in artificial intelligence
90C26 Nonconvex programming, global optimization
93C55 Discrete-time control/observation systems

Software:

GradSamp
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] A. E. Bryson and Ju-Shi Ho, Applied Optimal Control. Optimization, Estimation, and Control, CRC Press (2017).
[2] Ermoliev, YM, Methods of Stochastic Programming [in Russian] (1976), Moscow: Nauka, Moscow
[3] Ermoliev, YM; Gulenko, VP; Tsarenko, TI, Finite-Difference Method in Optimal Control Problems [in Russian] (1978), Kyiv: Naukova Dumka, Kyiv · Zbl 0408.49037
[4] Vasil’ev, FP, Methods of to Solve Extremum Problems [in Russian] (1981), Moscow: Nauka, Moscow
[5] Yu. G. Evtushenko, Optimization and Fast Automatic Differentiation [in Russian], VTs im. A. A. Dorodnitsyna RAN, Moscow (2013). http://dx.doi.org/10.1016/j.neunet.2014.09.003.
[6] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, Vol. 61, 85-117 (2015). http://dx.doi.org/10.1016/j.neunet.2014.09.003.
[7] Nurminskii, EA, Numerical Methods to Solve Stochastic Minimax Problems [in Russian] (1979), Kyiv: Naukova Dumka, Kyiv · Zbl 0452.65041
[8] Pontryagin, LS; Boltyanskii, VG; Gamkrelidze, RV; Mishchenko, EF, Mathematical Theory of Optimal Processes [in Russian] (1961), Moscow: Fizmatgiz, Moscow · Zbl 0102.31901
[9] Boltyanskii, VG, Optimal Control of Discrete Systems [in Russian] (1973), Moscow: Nauka, Moscow
[10] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by Back-Propagating errors,” Nature, Vol. 323, 533-536 (1986). https://doi.org/10.1038/323533a0. · Zbl 1369.68284
[11] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, Vol. 521, 436-444 (2015). https://doi.org.10.1038/nature14539.
[12] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Adaptive Computation and Machine Learning Series, MIT Press (2016). · Zbl 1373.68009
[13] Norkin, VI, Generalized-differentiable functions, Cybernetics, 16, 1, 10-12 (1980) · doi:10.1007/BF01099354
[14] Demyanov, VF, Conditions of Extremum and Variational Calculus [in Russian] (2005), Moscow: Vysshaya Shkola, Moscow
[15] Daduna, H.; Knopov, PS; Tur, LP, Optimal strategies for an inventory system with cost functions of general form, Cybern. Syst. Analysis, 35, 4, 602-618 (1999) · Zbl 0972.90002 · doi:10.1007/BF02835856
[16] Yu. M. Ermoliev and V. I. Norkin, “Stochastic generalized gradient method with application to insurance risk management,” Interim Report IR-97-021, Int. Inst. for Appl. Syst. Anal. Laxenburg, Austria (1997). URI: http://pure.iiasa.ac.at/5270.
[17] Pshenichnyi, BN, Necessary Extremum Conditions [in Russian] (1982), Moscow: Nauka, Moscow
[18] Demyanov, VF, Nonsmooth Problems of the Theory of Optimization and Control [in Russian] (1982), Leningrad: Izd. Leningrad. Univer, Leningrad · Zbl 0547.49001
[19] Clarke, FH, Optimization and Nonsmooth Analysis (1983), New York: Wiley, New York · Zbl 0582.49001
[20] Mordukhovich, BS, Methods of Approximations in Optimization and Control Problems [in Russian] (1988), Moscow: Nauka, Moscow · Zbl 0643.49001
[21] Rockafellar, RT; Wets, RJ-B, Variational Analysis (1998), Berlin-Heidelberg: Springer, Berlin-Heidelberg · Zbl 0888.49001
[22] H-S. Ahn, K. L. Moore, and Y. Q. Chen, Iterative Learning Control. Robustness and Monotonic Convergence for Interval Systems, Springer-Verlag, London (2007). · Zbl 1162.93025
[23] Aubin, J-P, Neural Networks and Qualitative Physics (1996), Cambridge: Cambridge University Press, Cambridge · Zbl 0920.68100
[24] Nikolenko, S.; Kadurin, A.; Arkhangel’skaya, E., Deep Learning (2018), St. Petersburg: Piter, St. Petersburg
[25] A. Griewank and A. Walther, Evaluating Derivatives. Principles and Techniques of Algorithmic Differentiation, Soc. for Industr. and Appl. Mathematics, Sec. Ed. Philadelphia (2008). · Zbl 1159.65026
[26] Hardt, M.; Recht, B.; Singer, Y., Train faster, generalize better: Stability of stochastic gradient descent, Proc. Machine Learning Research, 48, 1225-1234 (2016)
[27] C. Zhang, Q. Liao, A. Rakhlin, B. Miranda, N. Golowich, and T. Poggio, Theory of Deep Learning IIb: Optimization Properties of SGD, CBMM Memo No. 072, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge (2018). arXiv:1801.02254v1 [cs.LG] 7 Jan 2018.
[28] D. Soudry, E. Hoffer, M. S. Nacson, S. Gunasekar, and N. Srebro, The Implicit Bias of Gradient Descent on Separable Data. arXiv:1710.10345v3 [stat.ML] 21 Mar 2018. · Zbl 1477.62192
[29] L. Bottou, F.E. Curtisy, and J. Nocedalz, “Optimization methods for large-scale machine learning,” SIAM Review, Vol. 60, No. 2, 223-311 (2018). https://doi.org.10.1137/16m1080173. · Zbl 1397.65085
[30] Robbins, H.; Monro, S., A stochastic approximation method, The Annals of Mathematical Statistics, 22, 3, 400-407 (1951) · Zbl 0054.05901 · doi:10.1214/aoms/1177729586
[31] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. on Optimization, Vol. 19, No. 4, 1574-1609 (2009). https://doi.org/10.1137/070704277. · Zbl 1189.90109
[32] A. Shapiro, D. Dentcheva, and A. Ruszczynski, & Lectures on Stochastic Programming: Modeling and Theory, SIAM, Philadelphia (2009). · Zbl 1183.90005
[33] Ermol’ev, YM; Norkin, VI, Stochastic generalized gradient method to solve nonconvex nonsmooth stochastic optimization, Cybern. Syst. Analysis, 34, 2, 196-215 (1998) · Zbl 0930.90074 · doi:10.1007/BF02742069
[34] D. Davis, D. Drusvyatskiy, S. Kakade, and J. D. Lee, “Stochastic subgradient method converges on tame functions,” Found. Comput. Math., 1-36 (2019). https://doi.org/10.1007/s10208-018-09409-5. · Zbl 1433.65141
[35] A. Ruszczynski, & Convergence of a Stochastic Subgradient Method with Averaging for Nonsmooth Nonconvex Constrained Optimization, arXiv Preprint (2019). arXiv:1912.07580v1 [math.OC] 16 Dec 2019. https://arxiv.org/abs/1912.07580.
[36] R. Mifflin, “An algorithm for constrained optimization with semi-smooth functions,” Math. Oper. Res., Vol. 2, No. 2, 191-207 (1977). URL: www.jstor.org/stable/3689654. · Zbl 0395.90069
[37] R. Mifflin, Semismooth and semiconvex functions in constrained optimization,” SIAM J. Contr. Opt., Vol. 15, No. 6, 959-972 (1977). https://doi.org/10.1137/0315061. · Zbl 0376.90081
[38] Gupal, AM, Stochastic Methods to Solve Nonsmooth Extremum Problems [in Russian] (1979), Kyiv: Naukova Dumka, Kyiv
[39] R. Mifflin, “A modification and an extension of Lemarechal’s algorithm for nonsmooth minimization,” in: D.C. Sorensen and J.B. Wets (eds.), Nondifferential and Variational Techniques in Optimization, Pt. 2, Math. Prog. Study, Vol. 17, 77-90 (1982). · Zbl 0476.65047
[40] Shor, NZ, Minimization Methods for Nondifferentiable Functions (1985), Berlin-Heidelberg: Springer, Berlin-Heidelberg · Zbl 0561.90058
[41] Dorofeyev, PA, About some properties of the generalized gradient method, J. Vych. Mat. Mat. Fiz., 25, 2, 181-189 (1985) · Zbl 0567.90082
[42] Mikhalevich, VS; Gupal, AM; Norkin, VI, Methods of Nonconvex Optimization [in Russian] (1987), Moscow: Nauka, Moscow · Zbl 0635.90054
[43] Zavriev, SK; Perevozchikov, AG, Stochastic method of generalized gradient descent to solve minimax problems with bound variables, Vych. Mat. Mat. Fiz., 30, 4, 491-500 (1990) · Zbl 0708.90061
[44] Uryas’ev, SP, Adaptive Algorithms of Stochastic Optimization and Game Theory [in Russian] (1990), Moscow: Nauka, Moscow · Zbl 0709.90073
[45] Hiriart-Urruty, J-B; Lemarechal, C., Convex Analysis and Minimization Algorithms, Vol. II (1993), Berlin-Heidelberg: Springer-Verlag, Berlin-Heidelberg · Zbl 0795.49001
[46] M. Fukushima and L. Qi (eds.), Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, Kluwer Acad. Publ., Dordrecht-Boston-London (1999).
[47] Stetsyuk, PI, The theory and program implementations of Shor’s r-algorithms, Cybern. Syst. Analysis, 53, 5, 43-57 (2017) · Zbl 1382.65181 · doi:10.1007/s10559-017-9971-1
[48] V. I. Norkin, “Stochastic methods to solve nonconvex stochastic programming problems and their applications,” Author’s Abstracts of Ph.D. Theses, V. M. Glushkov Institute of Cybernetics, NAS Ukr., Kyiv (1998). URL: http://library.nuft.edu.ua/ebook/file/01.05.01
[49] Ermoliev, YM; Norkin, VI, Solution of nonconvex nonsmooth stochastic optimization problems, Cybern. Syst. Analysis, 39, 5, 701-715 (2003) · Zbl 1066.90071 · doi:10.1023/B:CASA.0000012091.84864.65
[50] Norkin, VI, Nonlocal minimization algorithms of nondifferentiable functions, Cybernetics, 14, 5, 704-707 (1978) · Zbl 0433.65030
[51] J. Bolte, A. Daniilidis, and A. Lewis, “Tame functions are semismooth,” Math. Program., Ser. B, Vol. 117, 5-19 (2009). https://doi.org.10.1007/s10107-007-0166-9. · Zbl 1158.49030
[52] Norkin, VI, Stochastic generalized-differentiable functions in the problem of nonconvex nonsmooth stochastic optimization, Cybernetics, 22, 6, 804-809 (1986) · Zbl 0693.90074 · doi:10.1007/BF01068698
[53] S. Mirică, “A note on the generalized differentiability of mappings,” Nonl. Anal. Theory, Methods, Applications, Vol. 4, No. 3, 567-575 (1980). https://doi.org/10.1016/0362-546x(80)90092-9. · Zbl 0437.26004
[54] Lyashko, II; Emel’yanov, VF; Boyarchuk, OK, Mathematical Analysis [in Ukrainian], Pt. 1 (1992), Kyiv: Vyshcha Shkola, Kyiv · Zbl 1103.26300
[55] L. Qi and J. Sun, “A nonsmooth version of Newton’s method,” Mathematical Programming, Vol. 58, 353-368 (1993). https://doi.org/10.1007/bf01581275. · Zbl 0780.90090
[56] L. Qi, “Convergence analysis of some algorithms for solving nonsmooth equations,” Mathematics of Operations Research, Vol. 18, 227-244 (1993). https://doi.org/10.1287/moor.18.1.227. · Zbl 0776.65037
[57] Polyak, BT, An Introduction to Optimization [in Russian] (1983), Moscow: Nauka, Moscow · Zbl 0652.49002
[58] S. Sternberg, Lectures on Differential Geometry, AMSE (1999). · Zbl 0129.13102
[59] J. Burke, A. Lewis, and M. Overton, “A robust gradient sampling algorithm for nonsmooth nonconvex optimization,” SIAM J. on Opt., Vol. 15, No. 3, 751-779 (2005). https://doi.org/10.1137/030601296. · Zbl 1078.65048
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.