Borkar, V. S. A learning algorithm for discrete-time stochastic control. (English) Zbl 1029.93065 Probab. Eng. Inf. Sci. 14, No. 2, 243-258 (2000). A simulation based algorithm for learning “good” policies for a discrete-time stochastic control process with unknown transition law is treated with the state and action spaces both being compact subsets of Euclidean spaces. Under suitable conditions almost sure convergence is proved. The paper is in the spirit of W. L. Baker (PhD. Thesis, Harvard University 1997), but it analyzes the full nonlinear case and is in the tradition of the ordinary differential equation approach. Reviewer: H.Hering (Göttingen) Cited in 1 Document MSC: 93E35 Stochastic learning and adaptive control 93C10 Nonlinear systems in control theory 93C55 Discrete-time control/observation systems Keywords:nonlinear control; \(Q\)-learning algorithm; compact state space; compact action space; simulation based algorithm; learning; discrete-time stochastic control; almost sure convergence PDFBibTeX XMLCite \textit{V. S. Borkar}, Probab. Eng. Inf. Sci. 14, No. 2, 243--258 (2000; Zbl 1029.93065) Full Text: DOI