Optimization and nonsmooth analysis.

*(English)*Zbl 0582.49001
Canadian Mathematical Society Series of Monographs and Advanced Texts. A Wiley-Interscience Publication. New York: John Wiley & Sons, Inc. xiii, 308 pp. (1983).

This work of some 300 pages is devoted to the study of optimization problems (in the broad sense of the term) in which the data are not necessarily differentiable. One of the fundamental tools of the approach taken here is the notion of a generalized gradient of a function, a concept introduced by the author in 1973 that has led to numerous works, developments and applications. We present a chapter-by-chapter description of what potential readers may find in this book.

In chapter 1 the author explains and gives a general view of what he intends to develop in the body of the text. He gives, for instance, some typical examples of situations that lead to minimization problems with nondifferentiable data. The “distance to a set” function is a particularly important example of a Lipschitz function that cannot be everywhere differentiable. This function plays a special role in the entire theory; it serves especially as a bridge between the analytic aspect of the material and related geometric concepts. Another example of nondifferentiable functions is that of functions of discrete maximum type, i.e. \(f=\max_{i=1,...,k}f_i\), an example that historically has played a very important role in the development of nondifferentiable optimization. In this chapter the author also presents the first definitions and properties of generalized gradients (of locally Lipschitz functions only) and of the cones tangent and normal to a set. This anticipates the material treated in detail in chapter 2. This chapter continues with an exposĂ© of three paradigms involving the dynamical aspect of optimization (calculus of variations and optimal control); they appear repeatedly in what follows, specifically in chapters 3, 4 and 5. The chapter ends with a review of the central themes of the calculus of variations and optimal control: necessary conditions and sufficient conditions of optimality.

Chapter 2, entitled “Generalized gradients”, is the longest of the book (85 pages). There one finds definitions and a quite complete panorama of the rules governing the calculation of generalized gradients. The study begins, as it should, with locally Lipschitz functions. Given a locally Lipschitz function \(f: X\to\mathbb R\), the generalized directional derivative \(d\mapsto f^0(x_0;d)\) and the generalized gradient of \(f\) at \(x_ 0\) are defined respectively by \(f^0(x_0;d)= \limsup_{x\to x_0}(f(x+\lambda d)-f(x))/\lambda\) and \(\partial f(x_0)=\{x^*\in X:\) \(\langle x^*,d\rangle\leq f(x_0;d)\}\). \(\partial f(x_0)\) is the subdifferential of \(f\) at \(x_0\) when \(f\) is convex, and contains only \(Df(x_0)\) when \(f\) is continuously differentiable at \(x_0\). The geometric concepts associated with a set \(S\) (cones tangent and normal to \(S\)) are introduced via the generalized gradient of the distance function \(d_ S\). The generalized directional derivative and the generalized gradient may also be defined for an arbitrary function (not necessarily locally Lipschitz, and able to take the value \(+\infty)\); the key to this approach consists of considering the geometric concepts introduced earlier for the epigraph of the function. This aspect of the method is certainly much more technical and lacks the character of simplicity associated with locally Lipschitz functions. There also arises a phenomenon that is quite typical of books based on works of research: nonspecialist readers do not get a clear image of the historical evolution of the material. Thus, the general formula of \(f^ 0(x_ 0;d)\) (p. 96) may seem abstruse to readers not trained in this kind of study. However that may be, one cannot undertake the calculation of generalized gradients of arbitrary functions without first thoroughly assimilating the locally Lipschitz case.

Defining a generalized derivative (point-by-point) for a vector-valued function \(f: X\to Y\) is not an easy thing to do. It turns out that in the case in which \(f=(f_0,...,f_m)^{\top}: {\mathbb{R}}^n\to {\mathbb{R}}^m\), one of the first definitions of \(\partial f(x_0)\) when \(f\) has real values can be extended to the case in which \(f\) takes values in \({\mathbb{R}}^m\). This leads to the definition of what is commonly called the generalized Jacobian matrix of \(f\) at \(x_0\), which is denoted by \(\partial f(x_0)\) or \({\mathcal J}f(x_0)\) and is a nonempty compact convex set of \({\mathcal L}({\mathbb{R}}^n,{\mathbb{R}}^m)\). This is a more precise concept than the simple consideration of \([\partial f_1(x_0),...,\partial f_m(x_0)]^{\top}\). Note that here the spaces under consideration are finite-dimensional (notably the target space of \(f\)); the definition of \(\partial f\) when \(f: X\to Y\), with \(X\) and \(Y\) arbitrary Banach spaces, continues to pose problems that have still not been completely solved. The results presented here are not always the best results known (cf., for example, the mean value theorem on p. 72, which can be improved, or the rule of the maximum on p. 47, which can be made more precise). Moreover, the problem evoked on p. 71, concerning the fact that \({\mathcal J}F(x_0)\) is “insensitive” to the values of \(DF\) on sets of measure zero, has been solved, as far as we know.

In short, whoever is interested in generalized gradients only as analytical tools will be happy with this chapter.

The subsequent three chapters in fact reflect what the author’s chief interests were in his presentation: differential inclusions, calculus of variations and optimal control. Chapter 3 is devoted to differential inclusions, i.e. equations of the type \(\dot x(t)\in F(t,x(t))\). After recalling some notions concerning multivalued mappings (measurability, measurable selection, integration), the author studies the existence of trajectories (i.e. solutions of the differential inclusion), the relations between the set of trajectories of the original problem and those of the relaxed problem \(\dot x(t)\in co F(t,x(t))\), as well as the compactness properties of the trajectories. The chapter continues with an optimal control problem in which the objective is to minimize a criterion of type \(f[x(b)]\) (the terminal state) among the trajectories of the differential inclusion satisfying a constraint on the initial state \(x(a)\in C_ 0\) and a constraint on the state: \(g(t,x(t))\leq 0\) for \(t\in [a,b]\). The existence theorem and the form of the multipliers lead to and make use of \(\partial^ >_ xg(t,x(t))\), a refinement of \(\partial_ xg(t,x(t))\). \(\partial^ >_ xg(t,x(t))\) takes into account only the limits of elements of \(\partial_ xg(t_ i,x_ i)\) for \(t_ i\to t\), \(x_ i\to x\) and especially \(g(t_ i,g_ i)>0\). This notion should be compared with that of a relative generalized gradient, used in mathematical programming (cf. Section 6.2) and introduced for the first time by the reviewer [Appl. Math. Optim. 5, 63–82 (1979; Zbl 0389.90088)]. An application to an economic problem is treated in great detail. The author then considers problems with constraints on \(x(a)\) and \(x(b)\), in fact under the parametrized form \(x(b)\in C_ 1+u\). Problems with arbitrary terminal time \(T\) and sufficient conditions of Hamilton-Jacobi-Bellman type conclude the chapter.

Chapter 4, entitled “Calculus of variations”, is developed in a more traditional context. The general model considered is the (generalized) Bolza model: \[ \text{Minimize}\;\ell(x(a),x(b))+ \int^{b}_{a}L(t,x(t),\dot x(t))\,dt \] among all absolutely continuous functions (arcs) \(y:[a,b]\to {\mathbb{R}}^ n\). Here \(\ell\) and \(L\) can take the value \(+\infty\), which means that many of the problems of the calculus of variations can be embraced by this model. Necessary conditions for optimality are established under the form of the existence of an arc \(p\) such that \((-\dot p(t),\dot x(t))\in \partial H(t,x,p(t))\) and \((p(a),-p(b))\in \partial \ell (x(a),x(b))\), where \(H\) is the Hamiltonian, i.e. such that \(H(t,x,q)\) is the value taken by the conjugate function of \(L(t,s,\cdot)\) at the point \(q\). Sufficient conditions for optimality are elucidated via the notion of conjugate points. In a more precisely formulated problem, that of minimizing \(f[x(1)]\) among the arcs \(x:[0,1]\to {\mathbb{R}}^ n\) satisfying \(x(0)\in C_ 0\), \(x(1)\in C_ 1\) and \((x(t),\dot x(t))\leq 0\) on \([0,1]\), the author in this instance describes the necessary conditions for optimality by using multipliers. He ends the chapter with a short paragraph on necessary conditions for optimality for a calculus of variations problem in which the functions \(x\) to be found are functions of several variables.

Chapter 5, like the preceding chapter, is relatively short and condensed. It treats questions commonly posed in optimal control: existence of an optimal control, necessary conditions for optimality (the maximum principle), controllability and sensitivity. A very interesting example in which the differential equation connecting the control and the state is written \(\dot x=\alpha (u-x)^+-\beta (x-u)^+\) \((\alpha >\beta >0)\) is developed in detail.

Chapter 6 concerns mathematical programming with locally Lipschitz data. This topic alone deserves to have an entire book devoted to it. Here the author presents necessary conditions for optimality, studies the sensitivity of the program to constraint perturbations and gives regularity conditions for the value function associated with the perturbed program. Consider for instance the following problem: Minimize \(f(x)\) under the constraints \(g_ i(x)\leq 0\), \(i=1,...,n\), \(h_ j(x)=0\), \(j=1,...,m\), \(x\in C\), where \(f\) and the \(g_ i\) and \(h_ j\) are locally Lipschitz. An example of a necessary condition satisfied by a solution of the program is the following: There exist \((\lambda_ 0,\lambda_ 1,...,\lambda_ n,\mu_ 1,...,\mu_ m)\neq 0\), \(\rho >0\), such that \(\lambda_ i\geq 0\) for all \(i=0,1,...,n\), \(\lambda_ ig_ i(x)=0\) for all \(i=1,...,n\), \(0\in \partial_ x(\lambda_ 0f+\sum_{i}\lambda_ ig_ i+\sum_{j}\mu_ jh_ j+\rho d_ C)(x_ 0)\). Note that the last condition is expressed in a “condensed” form, and is in that sense more precise than one written merely as \(0\in \lambda_ 0\partial f(x_ 0)+\sum_{i}\lambda_ i\partial g_ i(x_ 0)+...\). Another type of necessary condition for optimality, using relative generalized gradients, is also described. The reader interested in deepening and developing the results of this section can consult the references.

Chapter 7, which is quite different from the preceding chapters, is a potpourri of results and applications of nondifferentiable analysis. This series of examples, too long to report here, reflects the author’s interest in various realms of applied analysis.

A selection of 230 references concludes the book.

During the past ten years there have been numerous approaches to nondifferentiable problems. The author has limited himself in this work to the analysis and implementation of tools of which he has been the prime initiator. A real effort has been made to make the definitions and results accessible to those who are not experts in the field; the potential user of these concepts will have no problems using this book. Researchers wishing to invest time in nondifferentiable analysis and optimization will no doubt be better guided by the source articles; this is often the case when books present an elaborate finished product.

Finally, there is not doubt that this book is destined to become a classical reference in the fields that it covers.

In chapter 1 the author explains and gives a general view of what he intends to develop in the body of the text. He gives, for instance, some typical examples of situations that lead to minimization problems with nondifferentiable data. The “distance to a set” function is a particularly important example of a Lipschitz function that cannot be everywhere differentiable. This function plays a special role in the entire theory; it serves especially as a bridge between the analytic aspect of the material and related geometric concepts. Another example of nondifferentiable functions is that of functions of discrete maximum type, i.e. \(f=\max_{i=1,...,k}f_i\), an example that historically has played a very important role in the development of nondifferentiable optimization. In this chapter the author also presents the first definitions and properties of generalized gradients (of locally Lipschitz functions only) and of the cones tangent and normal to a set. This anticipates the material treated in detail in chapter 2. This chapter continues with an exposĂ© of three paradigms involving the dynamical aspect of optimization (calculus of variations and optimal control); they appear repeatedly in what follows, specifically in chapters 3, 4 and 5. The chapter ends with a review of the central themes of the calculus of variations and optimal control: necessary conditions and sufficient conditions of optimality.

Chapter 2, entitled “Generalized gradients”, is the longest of the book (85 pages). There one finds definitions and a quite complete panorama of the rules governing the calculation of generalized gradients. The study begins, as it should, with locally Lipschitz functions. Given a locally Lipschitz function \(f: X\to\mathbb R\), the generalized directional derivative \(d\mapsto f^0(x_0;d)\) and the generalized gradient of \(f\) at \(x_ 0\) are defined respectively by \(f^0(x_0;d)= \limsup_{x\to x_0}(f(x+\lambda d)-f(x))/\lambda\) and \(\partial f(x_0)=\{x^*\in X:\) \(\langle x^*,d\rangle\leq f(x_0;d)\}\). \(\partial f(x_0)\) is the subdifferential of \(f\) at \(x_0\) when \(f\) is convex, and contains only \(Df(x_0)\) when \(f\) is continuously differentiable at \(x_0\). The geometric concepts associated with a set \(S\) (cones tangent and normal to \(S\)) are introduced via the generalized gradient of the distance function \(d_ S\). The generalized directional derivative and the generalized gradient may also be defined for an arbitrary function (not necessarily locally Lipschitz, and able to take the value \(+\infty)\); the key to this approach consists of considering the geometric concepts introduced earlier for the epigraph of the function. This aspect of the method is certainly much more technical and lacks the character of simplicity associated with locally Lipschitz functions. There also arises a phenomenon that is quite typical of books based on works of research: nonspecialist readers do not get a clear image of the historical evolution of the material. Thus, the general formula of \(f^ 0(x_ 0;d)\) (p. 96) may seem abstruse to readers not trained in this kind of study. However that may be, one cannot undertake the calculation of generalized gradients of arbitrary functions without first thoroughly assimilating the locally Lipschitz case.

Defining a generalized derivative (point-by-point) for a vector-valued function \(f: X\to Y\) is not an easy thing to do. It turns out that in the case in which \(f=(f_0,...,f_m)^{\top}: {\mathbb{R}}^n\to {\mathbb{R}}^m\), one of the first definitions of \(\partial f(x_0)\) when \(f\) has real values can be extended to the case in which \(f\) takes values in \({\mathbb{R}}^m\). This leads to the definition of what is commonly called the generalized Jacobian matrix of \(f\) at \(x_0\), which is denoted by \(\partial f(x_0)\) or \({\mathcal J}f(x_0)\) and is a nonempty compact convex set of \({\mathcal L}({\mathbb{R}}^n,{\mathbb{R}}^m)\). This is a more precise concept than the simple consideration of \([\partial f_1(x_0),...,\partial f_m(x_0)]^{\top}\). Note that here the spaces under consideration are finite-dimensional (notably the target space of \(f\)); the definition of \(\partial f\) when \(f: X\to Y\), with \(X\) and \(Y\) arbitrary Banach spaces, continues to pose problems that have still not been completely solved. The results presented here are not always the best results known (cf., for example, the mean value theorem on p. 72, which can be improved, or the rule of the maximum on p. 47, which can be made more precise). Moreover, the problem evoked on p. 71, concerning the fact that \({\mathcal J}F(x_0)\) is “insensitive” to the values of \(DF\) on sets of measure zero, has been solved, as far as we know.

In short, whoever is interested in generalized gradients only as analytical tools will be happy with this chapter.

The subsequent three chapters in fact reflect what the author’s chief interests were in his presentation: differential inclusions, calculus of variations and optimal control. Chapter 3 is devoted to differential inclusions, i.e. equations of the type \(\dot x(t)\in F(t,x(t))\). After recalling some notions concerning multivalued mappings (measurability, measurable selection, integration), the author studies the existence of trajectories (i.e. solutions of the differential inclusion), the relations between the set of trajectories of the original problem and those of the relaxed problem \(\dot x(t)\in co F(t,x(t))\), as well as the compactness properties of the trajectories. The chapter continues with an optimal control problem in which the objective is to minimize a criterion of type \(f[x(b)]\) (the terminal state) among the trajectories of the differential inclusion satisfying a constraint on the initial state \(x(a)\in C_ 0\) and a constraint on the state: \(g(t,x(t))\leq 0\) for \(t\in [a,b]\). The existence theorem and the form of the multipliers lead to and make use of \(\partial^ >_ xg(t,x(t))\), a refinement of \(\partial_ xg(t,x(t))\). \(\partial^ >_ xg(t,x(t))\) takes into account only the limits of elements of \(\partial_ xg(t_ i,x_ i)\) for \(t_ i\to t\), \(x_ i\to x\) and especially \(g(t_ i,g_ i)>0\). This notion should be compared with that of a relative generalized gradient, used in mathematical programming (cf. Section 6.2) and introduced for the first time by the reviewer [Appl. Math. Optim. 5, 63–82 (1979; Zbl 0389.90088)]. An application to an economic problem is treated in great detail. The author then considers problems with constraints on \(x(a)\) and \(x(b)\), in fact under the parametrized form \(x(b)\in C_ 1+u\). Problems with arbitrary terminal time \(T\) and sufficient conditions of Hamilton-Jacobi-Bellman type conclude the chapter.

Chapter 4, entitled “Calculus of variations”, is developed in a more traditional context. The general model considered is the (generalized) Bolza model: \[ \text{Minimize}\;\ell(x(a),x(b))+ \int^{b}_{a}L(t,x(t),\dot x(t))\,dt \] among all absolutely continuous functions (arcs) \(y:[a,b]\to {\mathbb{R}}^ n\). Here \(\ell\) and \(L\) can take the value \(+\infty\), which means that many of the problems of the calculus of variations can be embraced by this model. Necessary conditions for optimality are established under the form of the existence of an arc \(p\) such that \((-\dot p(t),\dot x(t))\in \partial H(t,x,p(t))\) and \((p(a),-p(b))\in \partial \ell (x(a),x(b))\), where \(H\) is the Hamiltonian, i.e. such that \(H(t,x,q)\) is the value taken by the conjugate function of \(L(t,s,\cdot)\) at the point \(q\). Sufficient conditions for optimality are elucidated via the notion of conjugate points. In a more precisely formulated problem, that of minimizing \(f[x(1)]\) among the arcs \(x:[0,1]\to {\mathbb{R}}^ n\) satisfying \(x(0)\in C_ 0\), \(x(1)\in C_ 1\) and \((x(t),\dot x(t))\leq 0\) on \([0,1]\), the author in this instance describes the necessary conditions for optimality by using multipliers. He ends the chapter with a short paragraph on necessary conditions for optimality for a calculus of variations problem in which the functions \(x\) to be found are functions of several variables.

Chapter 5, like the preceding chapter, is relatively short and condensed. It treats questions commonly posed in optimal control: existence of an optimal control, necessary conditions for optimality (the maximum principle), controllability and sensitivity. A very interesting example in which the differential equation connecting the control and the state is written \(\dot x=\alpha (u-x)^+-\beta (x-u)^+\) \((\alpha >\beta >0)\) is developed in detail.

Chapter 6 concerns mathematical programming with locally Lipschitz data. This topic alone deserves to have an entire book devoted to it. Here the author presents necessary conditions for optimality, studies the sensitivity of the program to constraint perturbations and gives regularity conditions for the value function associated with the perturbed program. Consider for instance the following problem: Minimize \(f(x)\) under the constraints \(g_ i(x)\leq 0\), \(i=1,...,n\), \(h_ j(x)=0\), \(j=1,...,m\), \(x\in C\), where \(f\) and the \(g_ i\) and \(h_ j\) are locally Lipschitz. An example of a necessary condition satisfied by a solution of the program is the following: There exist \((\lambda_ 0,\lambda_ 1,...,\lambda_ n,\mu_ 1,...,\mu_ m)\neq 0\), \(\rho >0\), such that \(\lambda_ i\geq 0\) for all \(i=0,1,...,n\), \(\lambda_ ig_ i(x)=0\) for all \(i=1,...,n\), \(0\in \partial_ x(\lambda_ 0f+\sum_{i}\lambda_ ig_ i+\sum_{j}\mu_ jh_ j+\rho d_ C)(x_ 0)\). Note that the last condition is expressed in a “condensed” form, and is in that sense more precise than one written merely as \(0\in \lambda_ 0\partial f(x_ 0)+\sum_{i}\lambda_ i\partial g_ i(x_ 0)+...\). Another type of necessary condition for optimality, using relative generalized gradients, is also described. The reader interested in deepening and developing the results of this section can consult the references.

Chapter 7, which is quite different from the preceding chapters, is a potpourri of results and applications of nondifferentiable analysis. This series of examples, too long to report here, reflects the author’s interest in various realms of applied analysis.

A selection of 230 references concludes the book.

During the past ten years there have been numerous approaches to nondifferentiable problems. The author has limited himself in this work to the analysis and implementation of tools of which he has been the prime initiator. A real effort has been made to make the definitions and results accessible to those who are not experts in the field; the potential user of these concepts will have no problems using this book. Researchers wishing to invest time in nondifferentiable analysis and optimization will no doubt be better guided by the source articles; this is often the case when books present an elaborate finished product.

Finally, there is not doubt that this book is destined to become a classical reference in the fields that it covers.

Reviewer: J. B. Hiriart-Urruty (MR 85m:49002)

##### MSC:

49-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to calculus of variations and optimal control |

49J15 | Existence theories for optimal control problems involving ordinary differential equations |

49J52 | Nonsmooth analysis |

49K15 | Optimality conditions for problems involving ordinary differential equations |

26A24 | Differentiation (real functions of one variable): general theory, generalized derivatives, mean value theorems |

26B05 | Continuity and differentiation questions |

49L99 | Hamilton-Jacobi theories |

49L20 | Dynamic programming in optimal control and differential games |

93B05 | Controllability |

93B03 | Attainable sets, reachability |

90C25 | Convex programming |

90C30 | Nonlinear programming |