×

Maximum-stopping-value policies in finite Markov population decision chains. (English) Zbl 1325.60060

Summary: This paper considers infinite-horizon finite state-and-action Markov population decision chains (MPDCs) in which the goal is to find a stationary stopping policy with maximum stopping value, that is, with maximum value over all deterministic Markov stopping policies. A policy is stopping if the resulting expected population size in a period diminishes to zero as the period converges to infinity. The paper shows that the following are equivalent: (a) there is a stationary maximum-stopping value policy; (b) the maximum stopping value is finite; (c) there is a stopping policy and an excessive point of the optimal return operator; and (d) the maximum stopping value is the least excessive (resp., fixed) point of the optimal return operator. The paper shows how to use linear programming, policy improvement and successive approximations to solve the problem. The problem arises in stopping a Markov chain, as in optimally eradicating a pest or disease. The problem is one of two key subproblems used repeatedly to find Blackwell and Cesàro-overtaking optimal policies in finite Markov decision chains (MDCs), both of which, with their generalizations, have a vast number of applications. The problem for MDCs and/or MPDCs has been studied under various conditions, and over the years, by many investigators including Bellman, Bertsekas, Blackwell, Dantzig, Denardo, d’Epenoux, Derman, Dynkin, Eaton, Erickson, Ford, Hordijk, Howard, Kallenberg, Manne, O’Sullivan, Rothblum, Shoemaker, Strauch, Tsitsiklis, Veinott, Wagner, and Zadeh.

MSC:

60G40 Stopping times; optimal stopping problems; gambling theory
60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
60J22 Computational methods in Markov chains
90C40 Markov and semi-Markov decision processes
90C05 Linear programming
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Andersen ED, Ye Y (1996) Combining interior-point and pivoting algorithms for linear programming. Management Sci. 42(12):1719-1731. Abstract · Zbl 0893.90130
[2] Balinski ML (1961) On solving discrete stochastic decision problems. Technical report, Study 2. Mathematica, Princeton, NJ.
[3] Bather JA (1973) Optimal decision procedures for finite Markov chains. Part iii: General convex systems. Adv. Appl. Prob. 5:541-553. CrossRef · Zbl 0275.90050
[4] Bellman R (1958) On a routing problem. Quart. Appl. Math. 16:87-90. · Zbl 0081.14403
[5] Bertsekas DP (2007) Stochastic shortest path problems. Dynamic Programming and Optimal Control II, 3rd ed., Chap. 2 (Prentice Hall, Englewood Cliffs, NJ), 94-115.
[6] Bertsekas DP, Tsitsiklis JN (1991) An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3):580-595. Abstract · Zbl 0751.90077
[7] Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719-726. CrossRef · Zbl 0133.12906
[8] Blackwell D (1965) Positive dynamic programming. Neyman J, eds. Proc. 5th Berkeley Sympos. Math. Statist. Prob., Vol. I: Theory Statist., (University of California Press, Berkeley, CA), 415-418.
[9] Dantzig GB (1955) Optimal solution of a dynamic Leontief model with substitution. Econometrica 23:295-302. CrossRef · Zbl 0068.33404
[10] Denardo EV (1970) On linear programming in a Markov decision problem. Management Sci. 16(5):281-288. Abstract · Zbl 0191.48602
[11] Denardo EV, Miller BL (1968) An optimality condition for discrete dynamic programming with no discounting. Ann. Math. Statist. 39(4):1220-1227. CrossRef · Zbl 0167.18402
[12] Denardo EV, Rothblum UG (1979) Optimal stopping, exponential utility, and linear programming. Math. Programming 16:228-244. CrossRef
[13] d’Epenoux F (1960) Sur un probleme de production et de stockage dans l’aleatoire. Revue Frangaise de Recherche Operationnelle 4(14):3-16. An English translation appears as: d’Epenoux F (1963) A probabilistic production and inventory problem. Management Sci. 10(1):98-108.
[14] Derman C (1962) On sequential decisions and Markov chains. Management Sci. 9(1):16-24. Abstract · Zbl 0995.90621
[15] Derman C (1970) Finite State Markovian Decision Processes (Academic Press, New York).
[16] Dynkin EB (1963) The optimum choice of the instant for stopping a Markov process. Soviet Math. 4:627-629. Translated from the Russian of: Dokl. Akad. Nauk SSSR 150:238-240.
[17] Dynkin EB, Yushkevich AA (1969) Markov Processes: Theory and Problems (Plenum Press, New York). Translated from Russian text published by Nauka Press, Moscow, 1967. CrossRef
[18] Eaton JH, Zadeh LA (1962) Optimal pursuit strategies in discrete-state probabilistic systems. Trans. ASME Ser. D., J. Basic Eng. 84:23-29. CrossRef
[19] Eaves BC, Veinott AF Jr (1974) Policy improvement in positive, negative and stopping Markov decision chains. Department of Operations Research, Stanford University, Stanford, CA. Unpublished manuscript. Presented by second author at the ORSA/TIMS Meeting in San Juan, Puerto Rico.
[20] Erickson RE (1978) Minimum-concave-cost single-source network flows. Ph.D. thesis, Department of Operations Research, Stanford University, Stanford, CA.
[21] Erickson RE (1988) Optimality of stationary halting policies and finite termination of successive approximations. Math. Oper. Res. 13(1):90-98. Abstract · Zbl 0646.90089
[22] Erickson RE, Monma CL, Veinott AF Jr (1987) Send-and-split method for minimum-concave-cost network flows. Math. Oper. Res. 12(4):634-664. Abstract · Zbl 0667.90036
[23] Ford LR (1956) Network flow theory. Technical Report P-923, RAND Corporation, Santa Monica, CA.
[24] Hordijk A, Kallenberg LCM (1984) Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Math. Programming 30:46-70. CrossRef · Zbl 0546.90101
[25] Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Management Sci. 25(4):352-362. Abstract · Zbl 0421.90076
[26] Howard RA (1960) Dynamic Programming and Markov Processes (Technology Press MIT, Cambridge, MA).
[27] Kallenberg LCM (1980) Linear programming and finite Markovian control problems. Ph.D. thesis, Mathematisch Centrum, Amsterdam.
[28] Manne AS (1960) Linear programming and sequential decisions. Management Sci. 6(3):259-267. Abstract · Zbl 0995.90599
[29] Miller BL, Veinott AF Jr (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40(2):366-370. CrossRef · Zbl 0175.47302
[30] O’Sullivan M, Veinott AF Jr (2007) Polynomial-time computation of n-optimal policies in Markov decision chains. Technical report, Department of Management Science and Engineering, Stanford University, Stanford, CA.
[31] Rothblum UG (1974) Multiplicative Markov decision processes. Ph.D. thesis, Department of Operations Research, Stanford University, Stanford, CA.
[32] Shoemaker CA (1979) Optimal timing of multiple applications of pesticides with residual toxicity. Biometrics 35:803-8012. CrossRef · Zbl 0426.92021
[33] Shoemaker CA (1981) The application of dynamic programming and other optimization methods to pest management. IEEE Trans. Automatic Control 26:1125-1132. CrossRef · Zbl 0475.92015
[34] Shoemaker CA (1982) Optimal integrated control of univoltine pest populations with age structure. Oper. Res. 30(1):40-61. Abstract · Zbl 0494.92019
[35] Strauch RE (1966) Negative dynamic programming. Ann. Math. Stat. 37(4):871-890. CrossRef · Zbl 0144.43201
[36] Veinott AF Jr (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37(5):1284-1294. CrossRef · Zbl 0149.16301
[37] Veinott AF Jr (1968) Discrete dynamic programming with sensitive optimality criteria. Ann. Math. Statist. 39:1372. Preliminary Report.
[38] Veinott AF Jr (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40(5):1635-1660. CrossRef · Zbl 0183.49102
[39] Veinott AF Jr (1969) Dynamic programming and stochastic control. Course Materials, Operations Research 351, Department of Operations Research, Stanford University, Stanford, CA.
[40] Veinott AF Jr (1972) Dynamic programming and stochastic control. Course Materials, Operations Research 351, Department of Operations Research, Stanford University, Stanford, CA.
[41] Veinott AF Jr (1974) Markov decision chains. Eaves BC, Dantzig GB, eds. Studies in Optimization, MAA Studies in Mathematics, Vol. 10 (Mathematical Association of America, Washington, DC), 124-159.
[42] Wagner HM (1960) On the optimality of pure strategies. Management Sci. 5(3):268-269. Abstract · Zbl 0995.90564
[43] Ye Y, Todd MJ, Mizuno S (1994) An \mathcal{O}(\sqrt{nl})-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1):53-67. Abstract · Zbl 0799.90087
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.