×

Structure of optimal policies for discounted semi-Markov decision programming with unbounded rewards. (English) Zbl 0563.90099

We discuss the structure of optimal policies for discounted semi-Markov decision programming with unbounded rewards (URSMDP) as discussed by S. A. Lippman [Manage. Sci., Theory 19, 717-731 (1973; Zbl 0259.60044)]. We prove that, if a policy \(\pi^*=(\pi^*_ 0,\pi^*_ 1,\pi^*_ 2,...)\) is \(\alpha\)-optimal, then the stochastic stationary policy \(\pi_ 0^{^*\infty}=(\pi_ 0^*,\pi_ 0^*,...)\) is \(\alpha\)- optimal too (for the same \(\alpha)\); for any given integer \(n\geq 1\), we give the sufficient conditions, in which \(\pi_ n^{^*\infty}=(\pi_ n^*,\pi_ n^*,\pi_ n^*,...)\) (under suitable history) is \(\alpha\)-optimal; any stochastic stationary \(\alpha\)-optimal policy \(\pi_ 0^{\infty}\) can be decomposed into some optimal determinate stationary policies (may be infinite), and it must be a convex combination of these optimal determinate stationary policies (for the same \(\alpha)\).

MSC:

90C40 Markov and semi-Markov decision processes

Citations:

Zbl 0259.60044
PDFBibTeX XMLCite