Dong, Zeqing; Liu, Ke Structure of optimal policies for discounted semi-Markov decision programming with unbounded rewards. (English) Zbl 0563.90099 Adv. Math., Beijing 14, No. 1, 68-69 (1985). We discuss the structure of optimal policies for discounted semi-Markov decision programming with unbounded rewards (URSMDP) as discussed by S. A. Lippman [Manage. Sci., Theory 19, 717-731 (1973; Zbl 0259.60044)]. We prove that, if a policy \(\pi^*=(\pi^*_ 0,\pi^*_ 1,\pi^*_ 2,...)\) is \(\alpha\)-optimal, then the stochastic stationary policy \(\pi_ 0^{^*\infty}=(\pi_ 0^*,\pi_ 0^*,...)\) is \(\alpha\)- optimal too (for the same \(\alpha)\); for any given integer \(n\geq 1\), we give the sufficient conditions, in which \(\pi_ n^{^*\infty}=(\pi_ n^*,\pi_ n^*,\pi_ n^*,...)\) (under suitable history) is \(\alpha\)-optimal; any stochastic stationary \(\alpha\)-optimal policy \(\pi_ 0^{\infty}\) can be decomposed into some optimal determinate stationary policies (may be infinite), and it must be a convex combination of these optimal determinate stationary policies (for the same \(\alpha)\). Cited in 2 Documents MSC: 90C40 Markov and semi-Markov decision processes Keywords:alpha optimality; structure of optimal policies; discounted semi-Markov decision programming; unbounded rewards; stochastic stationary policy Citations:Zbl 0259.60044 PDFBibTeX XMLCite \textit{Z. Dong} and \textit{K. Liu}, Adv. Math., Beijing 14, No. 1, 68--69 (1985; Zbl 0563.90099)