×

A learning algorithm for discrete-time stochastic control. (English) Zbl 1029.93065

A simulation based algorithm for learning “good” policies for a discrete-time stochastic control process with unknown transition law is treated with the state and action spaces both being compact subsets of Euclidean spaces. Under suitable conditions almost sure convergence is proved. The paper is in the spirit of W. L. Baker (PhD. Thesis, Harvard University 1997), but it analyzes the full nonlinear case and is in the tradition of the ordinary differential equation approach.

MSC:

93E35 Stochastic learning and adaptive control
93C10 Nonlinear systems in control theory
93C55 Discrete-time control/observation systems
PDFBibTeX XMLCite
Full Text: DOI