×

Recurrence conditions for Markov decision processes with Borel state space: A survey. (English) Zbl 0717.90087

Summary: This paper describes virtually all the recurrence conditions used heretofore for Markov decision processes with Borel state and action spaces, which include some forms of mixing and contraction properties, Doeblin’s condition, Harris recurrence, strong ergodicity, and the existence of bounded solutions to the optimality equation for average reward processes. The aim is to establish (when possible) implications and equivalences between these conditions.

MSC:

90C40 Markov and semi-Markov decision processes
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] D. Assaf, Invariant problems in dynamic programming: Average reward criterion, Stoch. Proc. Appl. 10 (1980) 313–322. · Zbl 0444.90102 · doi:10.1016/0304-4149(80)90014-9
[2] R.N. Bhattacharya and M. Majumdar, Controlled semi-Markov models under long-run average rewards, J. Statist. Plann. Inference 22 (1989) 223–242. · Zbl 0683.49003 · doi:10.1016/0378-3758(89)90113-4
[3] R. Cavazos-Cadena, Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains, Syst. Control Lett. 10 (1988) 71–78. · Zbl 0645.90099 · doi:10.1016/0167-6911(88)90043-6
[4] R. Cavazos-Cadena and O. Hernández-Lerma, Recursive adaptive control of Markov decision processes with the average reward criterion, Appl. Math. Optim., to appear. · Zbl 0723.90085
[5] C. Derman, Denumerable state markovian decision processes – average cost criterion, Ann. Math. Statist. 37 (1966) 1545–1553. · Zbl 0144.43102 · doi:10.1214/aoms/1177699146
[6] J.L. Doob,Stochastic Processes (Wiley, New York, 1953).
[7] P. Doukhan and M. Ghindès, Etude du processus:X n+1=(X n ){\(\cdot\)}, C.R. Acad. Sc. Paris, Sér. A. 290 (1980) 921–923.
[8] E.B. Dynkin and A.A. Yushkevich,Controlled Markov Processes (Springer, New York, 1979). · Zbl 0073.34801
[9] A. Federgruen, A. Hordijk and H.C. Tijms, A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices, J. Appl. Probab. 15 (1978) 842–847. · Zbl 0395.60058 · doi:10.2307/3213439
[10] C.A. Futia, Invariant distributions and the limiting behavior of markovian economic models, Econometrica 50 (1982) 377–408. · Zbl 0474.90027 · doi:10.2307/1912634
[11] J.P. Georgin, ContrÔle de chaÎnes de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré 14 (1978) 255–277. · Zbl 0391.60066
[12] E.I. Gordienko, Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985) 504–518. · Zbl 0577.93067 · doi:10.1137/1129064
[13] L.G. Gubenko and E.S. Statland, On controlled, discrete-time Markov decision processes, Theory Probab. Math. Statist. 7 (1975) 47–61. · Zbl 0359.93048
[14] O. Hernández-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).
[15] O. Hernández-Lerma, J.C. Hennet and J.B. Lasserre, Average cost Markov decision processes: Optimality conditions, J. Math. Anal. Appl., to appear. · Zbl 0739.90072
[16] C.J. Himmelberg, T. Parthasarathy and F.S. Van Vleck, Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976) 390–394. · Zbl 0368.90134 · doi:10.1287/moor.1.4.390
[17] K. Hinderer,Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes Oper. Res. 33 (Springer, New York, 1970). · Zbl 0202.18401
[18] A. Hordijk,Dynamic Programming and Markov Potential Theory, Math. Centre Tracts vol. 51, 2nd ed. (Mathematical Centrum, Amsterdam, 1977). · Zbl 0401.90103
[19] M. Iosifescu and R. Theodorescu,Random Processes and Learning (Springer, New York, 1969). · Zbl 0194.51101
[20] N.V. Kartashov, Criteria for uniform ergodicity and strong stability of Markov chains with a common phase space, Theory Probab. Math. Statist. 30 (1985) 71–89. · Zbl 0586.60058
[21] M. Kurano, Markov decision processes with a Borel measurable cost function: The average reward case, Math. Oper. Res. 11 (1986) 309–320. · Zbl 0607.90087 · doi:10.1287/moor.11.2.309
[22] M. Kurano, Semi-Markov decision processes with a reachable state-subset, Optimization 20 (1989) 305–315. · Zbl 0677.90087 · doi:10.1080/02331938908843446
[23] M. Kurano, The existence of a minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin, SIAM J. Control Optim. 27 (1989) 296–307. · Zbl 0677.90085 · doi:10.1137/0327016
[24] M. Kurano, Average cost Markov decision processes under the hypothesis of Doeblin, this volume, pp. 375–386. · Zbl 0738.90085
[25] A. Mokkadem, Sur un modèle autorégressif non linéaire. Ergodicité et ergodicité géométrique, J. Time Series Anal. 8 (1987) 195–204. · Zbl 0621.60076 · doi:10.1111/j.1467-9892.1987.tb00432.x
[26] J. Neveu,Mathematical Foundations of the Calculus of Probability (Holden-Day, San Francisco, 1965). · Zbl 0137.11301
[27] E. Nummelin,General Irreducible Markov Chains and Non-Negative Operators (Cambridge University Press, Cambridge, 1984). · Zbl 0551.60066
[28] E. Nummelin and P. Tuominen, Geometric ergodicity of Harris recurrent Markov chains with applications to renewal theory, Stoch. Proc. Appl. 12 (1982) 187–202. · Zbl 0484.60056 · doi:10.1016/0304-4149(82)90041-2
[29] S. Orey,Limit Theorems for Markov Chain Transition Probabilities (Van Nostrand Reinhold, London, 1971). · Zbl 0295.60054
[30] A.B. Piunovski, General control models with the infinite horizon, Problems Control Inf. Theory 18 (1989) 169–182.
[31] D. Revuz,Markov Chains (North-Holland, Amsterdam, 1975).
[32] U. Rieder, On non-discounted dynamic programming with arbitrary state space (1979). (Author’s address: Abt. Mathematik VII, UniversitÄt Ulm, D-7900 Ulm, Germany.)
[33] S.M. Ross, Arbitrary state markovian decision processes, Ann. Math. Statist. 39 (1968) 2118–2122. · Zbl 0179.24704 · doi:10.1214/aoms/1177698041
[34] M. SchÄl, Conditions for optimality in dynamic programming and for the limit ofn-stage optimal policies to be optimal, Z. Wahrs. verw. Geb. 32 (1975) 179–196. · Zbl 0316.90080 · doi:10.1007/BF00532612
[35] L.C. Thomas, Connectedness conditions for denumerable state Markov decision processes, in:Recent Developments in Markov Decision Processes, eds. R. Hartley, L.C. Thomas and D.J. White (Academic Press, London, 1980) pp. 181–204.
[36] H.C. Tijms, On dynamic programming with arbitrary state space, compact action space and the average reward as criterion, Report BW 55/75, Mathematisch Centrum, Amsterdam (1975).
[37] R.L. Tweedie, Criteria for classifying general Markov chains, Adv. Appl. Probab. 8 (1976) 737–771. · Zbl 0361.60014 · doi:10.2307/1425932
[38] R.L. Tweedie, Modes of convergence of Markov chain transition probabilities, J. Math. Anal. Appl. 60 (1977) 280–291. · Zbl 0371.60083 · doi:10.1016/0022-247X(77)90067-1
[39] R.L. Tweedie, Criteria for rates of convergence of Markov chains, with applications to queueing and storage theory, in:Papers in Probability, Statistics and Analysis, eds. J.F.C. Kingman and G.E.H. Reuter (Cambridge University Press, Cambridge, 1983) pp. 260–276.
[40] T. Ueno, Some limit theorems for temporally discrete Markov processes, J. Fac. Science Univ. Tokyo 7 (1957) 449–462. · Zbl 0077.33201
[41] J. Wijngaard, Stationary markovian decision problems and perturbation theory of quasi-compact linear operators, Math. Oper. Res. 2 (1977) 91–102. · Zbl 0395.90081 · doi:10.1287/moor.2.1.91
[42] K. Yamada, Duality theorem in Markovian decision problems, J. Math. Anal. Appl. 50 (1975) 579–595. · Zbl 0323.90053 · doi:10.1016/0022-247X(75)90011-6
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.