Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems.

*(English)*Zbl 1335.93068Summary: In this paper we present in a continuous-time framework an online approach to direct adaptive optimal control with infinite horizon cost for nonlinear systems. The algorithm converges online to the optimal control solution without knowledge of the internal system dynamics. Closed-loop dynamic stability is guaranteed throughout. The algorithm is based on a reinforcement learning scheme, namely Policy Iterations, and makes use of neural networks, in an Actor/Critic structure, to parametrically represent the control policy and the performance of the control system. The two neural networks are trained to express the optimal controller and optimal cost function which describes the infinite horizon control performance. Convergence of the algorithm is proven under the realistic assumption that the two neural networks do not provide perfect representations for the nonlinear control and cost functions. The result is a hybrid control structure which involves a continuous-time controller and a supervisory adaptation structure which operates based on data sampled from the plant and from the continuous-time performance dynamics. Such control structure is unlike any standard form of controllers previously seen in the literature. Simulation results, obtained considering two second-order nonlinear systems, are provided.

##### MSC:

93C40 | Adaptive control/observation systems |

92B20 | Neural networks for/in biological studies, artificial life and related topics |

93C10 | Nonlinear systems in control theory |

PDF
BibTeX
XML
Cite

\textit{D. Vrabie} and \textit{F. Lewis}, Neural Netw. 22, No. 3, 237--246 (2009; Zbl 1335.93068)

Full Text:
DOI

##### References:

[1] | Abu-Khalaf, M.; Lewis, F.L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 5, 779-791, (2005) · Zbl 1087.49022 |

[2] | Abu-Khalaf, M.; Lewis, F.L.; Huang, J., Policy iterations and the hamilton – jacobi – isaacs equation for H-infinity state-feedback control with input saturation, IEEE transactions on automatic control, 51, 1989-1995, (2006) · Zbl 1366.93147 |

[3] | Beard, R.; Saridis, G.; Wen, J., Galerkin approximations of the generalized hamilton – jacobi – bellman equation, Automatica, 33, 11, 2159-2177, (1997) · Zbl 0949.93022 |

[4] | Baird, L.C., Reinforcement learning in continuous time: advantage updating, Proceedings of ICNN, 4, 2448-2453, (1994) |

[5] | Bertsekas, D.P.; Tsitsiklis, J.N., Neuro-dynamic programming, (1996), Athena Scientific MA · Zbl 0924.68163 |

[6] | Brannon, N., Seiffertt, J., Draelos, T., & Wunch, D. (2009). Coordinated machine learning and decision support for situation awareness. Neural Networks, in this issue (doi:10.1016/j.neunet.2009.03.013) |

[7] | Doya, K., Reinforcement learning in continuous time and space, Neural computation, 12, 1, 219-245, (2000) |

[8] | Doya, K.; Kimura, H.; Kawato, M., Neural mechanisms of learning and control, IEEE control systems magazine, 21, 4, 42-54, (2001) |

[9] | Hanselmann, T.; Noakes, L.; Zaknich, A., Continuous-time adaptive critics, IEEE transactions on neural networks, 18, 3, 631-647, (2007) |

[10] | Hornik, K.; Stinchcombe, M.; White, M., Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural networks, 3, 551-560, (1990) |

[11] | Howard, R.A., Dynamic programming and Markov processes, (1960), MIT Press Cambridge, MT · Zbl 0091.16001 |

[12] | Huang, J.; Lin, C.F., Numerical approach to computing nonlinear \(H_\infty\) control laws, Journal of guidance, control and dynamics, 18, 5, 989-994, (1995) · Zbl 0841.93018 |

[13] | Kirk, D.E., Optimal control theory — an introduction, (2004), Dover Pub. Inc., Mineola New York |

[14] | Kleinman, D., On an iterative technique for Riccati equation computations, IEEE transactions on automatic control, 13, 114-115, (1968) |

[15] | Kolmogorov, A.N.; Fomin, S.V., Elements of the theory of functions and functional analysis, (1999), Dover Pub Inc., Mineola New York · Zbl 0235.46001 |

[16] | Leake, R.J.; Liu, R.-W., Construction of suboptimal control sequences, Journal SIAM control, 5, 1, 54-63, (1967) · Zbl 0153.13103 |

[17] | () |

[18] | Lewis, F.L.; Syrmos, V.L., Optimal control, (1995), John Wiley |

[19] | Murray, J.J.; Cox, C.J.; Lendaris, G.G.; Saeks, R., Adaptive dynamic programming, IEEE transactions on systems, man and cybernetics, 32, 2, 140-153, (2002) |

[20] | Nevistic, V., & Primbs, J. (1996). Constrained nonlinear optimal control: A converse HJB approach. Technical report 96-021. California Institute of Technology |

[21] | Perlovsky, L. (2009). Language and cognition. Neural Networks, in this issue (doi:10.1016/j.neunet.2009.03.007) |

[22] | Prokhorov, D.; Wunsch, D., Adaptive critic designs, IEEE transactions on neural networks, 8, 5, 997-1007, (1997) |

[23] | Schultz, W., Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioral ecology, Current opinion in neurobiology, 14, 139-147, (2004) |

[24] | Schultz, W.; Dayan, P.; Montague, P.R., A neural substrate of prediction and reward, Science, 275, 1593-1599, (1997) |

[25] | Schultz, W.; Tremblay, L.; Hollerman, J.R., Reward processing in primate orbitofrontal cortex and basal ganglia, Cerebral cortex, 10, 272-283, (2000) |

[26] | Sutton, R.S.; Barto, A.G., Reinforcement learning — an introduction, (1998), MIT Press Cambridge, MT |

[27] | Sutton, R.S.; Barto, A.G.; Williams, R.J., Reinforcement learning is direct adaptive optimal control, IEEE control systems magazine, April, 19-22, (1992) |

[28] | Van Der Schaft, A.J., L2-gain analysis of nonlinear systems and nonlinear state feedback \(H \infty\) control, IEEE transactions on automatic control, 37, 6, 770-784, (1992) · Zbl 0755.93037 |

[29] | Vrabie, D., Pastravanu, O., & Lewis, F. L. (2007). Policy iteration for continuous-time systems with unknown internal dynamics. In IEEE Proceedings of MED’07(pp. 1-6) |

[30] | Vrabie, D., & Lewis, F. (2008). Adaptive optimal control algorithm for continuous-time nonlinear systems based on policy iteration. In IEEE Proc. CDC’08. (pp. 73-79) |

[31] | Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. thesis. University of Cambridge, England |

[32] | Werbos, P. (1989). Neural networks for control and system identification. In IEEE Proc. CDC’89. 1 (pp. 260-265) |

[33] | Werbos, P.J., Approximate dynamic programming for real-time control and neural modeling, (), 493-525 |

[34] | Werbos, P. (2009). Intelligence in the brain: A theory of how it works and how to build it. Neural Networks, in this issue (doi:10.1016/j.neunet.2009.03.012) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.