an:06889862
Zbl 1394.60085
Etessami, Kousha; Stewart, Alistair; Yannakakis, Mihalis
Greatest fixed points of probabilistic min/max polynomial equations, and reachability for branching Markov decision processes
EN
Inf. Comput. 261, Part 2, 355-382 (2018).
00403819
2018
j
60J80
branching Markov decision processes; greatest fixed point; Bellman optimality equations
Summary: We give polynomial time algorithms for quantitative (and qualitative) reachability analysis for branching Markov decision processes (BMDPs). Specifically, given a BMDP, and given an initial population, where the objective of the controller is to maximize (or minimize) the probability of eventually reaching a population that contains an object of a desired (or undesired) type, we give algorithms for approximating the supremum (infimum) reachability probability, within desired precision \(\epsilon > 0\), in time polynomial in the encoding size of the BMDP and in \(\log(1 / \epsilon)\). We furthermore give P-time algorithms for computing \(\epsilon\)-optimal strategies for both maximization and minimization of reachability probabilities. We also give P-time algorithms for all associated qualitative analysis problems, namely: deciding whether the optimal (supremum or infimum) reachability probabilities are 0 or 1. Prior to this paper, approximation of optimal reachability probabilities for BMDPs was not even known to be decidable.
Our algorithms exploit the following basic fact: we show that for any BMDP, its maximum (minimum) non-reachability probabilities are given by the greatest fixed point (GFP) solution \(g^\ast \in [0, 1]^n\) of a corresponding monotone max (min) probabilistic polynomial system of equations (max/minPPS), \(x = P(x)\), which are the Bellman optimality equations for a BMDP with non-reachability objectives. We show how to compute the GFP of max/minPPSs to desired precision in P-time.
We also study more general \textit{branching simple stochastic games} (BSSGs) with (non-)reachability objectives. We show that: (1) the value of these games is captured by the GFP, \(g^\ast\), of a corresponding max-minPPS, \(x = P(x)\); (2) the \textit{quantitative} problem of approximating the value is in TFNP; and (3) the \textit{qualitative} problems associated with the value are all solvable in P-time.