×

zbMATH — the first resource for mathematics

Detection-averse optimal and receding-horizon control for Markov decision processes. (English) Zbl 1453.93243
Summary: In this paper, we consider a Markov decision process (MDP) in which the ego agent intends to hide its state from detection by an adversary while pursuing a nominal objective. After formulating the detection-averse MDP problem, we first describe a value iteration (VI) approach to exactly solve it. To overcome the “curse of dimensionality” and thus gain scalability to larger-sized problems, we then propose a receding-horizon optimization (RHO) approach to compute approximate solutions. Numerical examples are reported to illustrate and compare the VI and RHO approaches, and show the potential of the proposed problem formulation for practical applications.
MSC:
93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Bellman, R. E., Dynamic programming (2003), Dover Publications, Inc.: Dover Publications, Inc. New York, NY, USA
[2] Bertsekas, D. P., Dynamic programming and optimal control (1995), Athena Scientific: Athena Scientific Belmont, MA, USA · Zbl 0904.90170
[3] Biondi, F.; Legay, A.; Nielsen, B. F.; Wasowski, A., Maximizing entropy over Markov processes, Journal of Logical and Algebraic Methods in Programming, 83, 5, 384-399 (2014) · Zbl 1371.68175
[4] Chen, Y.; Georgiou, T. T.; Pavon, M., Optimal steering of a linear stochastic system to a final probability distribution, Part I, IEEE Transactions on Automatic Control, 61, 5, 1158-1169 (2015) · Zbl 1359.93532
[5] Chen, Z., Bayesian filtering: From Kalman filters to particle filters, and beyond, Statistics, 182, 1, 1-69 (2003)
[6] Cruz-Suárez, D.; Montes-de Oca, R.; Salem-Silva, F., Conditions for the uniqueness of optimal policies of discounted Markov decision processes, Mathematical Methods of Operations Research, 60, 3, 415-436 (2004) · Zbl 1104.90053
[7] Dontchev, A.; Kolmanovsky, I.; Krastanov, M.; Veliov, V.; Vuong, P., Approximating optimal finite horizon feedback by model predictive control, Systems & Control Letters, 139, Article 104666 pp. (2020) · Zbl 1447.93087
[8] Farokhi, F.; Egerstedt, M., Optimal stochastic evasive maneuvers using the Schrödinger’s equation, IEEE Control Systems Letters, 3, 3, 517-522 (2019)
[9] Giraldo, J.; Sarkar, E.; Cardenas, A. A.; Maniatakos, M.; Kantarcioglu, M., Security and privacy in cyber-physical systems: A survey of surveys, IEEE Design & Test, 34, 4, 7-17 (2017)
[10] Grüne, L.; Pannek, J., Nonlinear model predictive control (2017), Springer · Zbl 1429.93003
[11] Grüne, L.; Rantzer, A., On the infinite horizon performance of receding horizon controllers, IEEE Transactions on Automatic Control, 53, 9, 2100-2111 (2008) · Zbl 1367.90109
[12] Hernández-Lerma, O., Adaptive Markov control processes, Vol. 79 (2012), Springer Science & Business Media
[13] Hibbard, M.; Savas, Y.; Wu, B.; Tanaka, T.; Topcu, U., Unpredictable planning under partial observability (2019), arXiv preprint arXiv:1903.07665
[14] Hotz, A.; Skelton, R. E., Covariance control theory, International Journal of Control, 46, 1, 13-32 (1987) · Zbl 0626.93080
[15] Howard, R. A., Dynamic programming and Markov processes (1960), John Wiley · Zbl 0091.16001
[16] Jackman, S., Bayesian analysis for the social sciences, Vol. 846 (2009), John Wiley & Sons
[17] Jacob, R.; Lesage, J.-J.; Faure, J.-M., Overview of discrete event systems opacity: Models, validation, and quantification, Annual Reviews in Control, 41, 135-146 (2016)
[18] Jia, R.; Dong, R.; Sastry, S. S.; Sapnos, C. J., Privacy-enhanced architecture for occupancy-based HVAC control, (8th International conference on cyber-physical systems (ICCPS) (2017), ACM/IEEE), 177-186
[19] Keerthi, S.a.; Gilbert, E. G., Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and moving-horizon approximations, Journal of Optimization Theory and Applications, 57, 2, 265-293 (1988) · Zbl 0622.93044
[20] Lafortune, S., Discrete event systems: Modeling, observation, and control, Annual Review of Control, Robotics, and Autonomous Systems (2019)
[21] Maciejowski, J. M., Predictive control: with constraints (2002), Pearson Education · Zbl 0978.93002
[22] Puterman, M. L., Markov decision processes: Discrete stochastic dynamic programming (1994), John Wiley & Sons, Inc.: John Wiley & Sons, Inc. New York, NY, USA · Zbl 0829.90134
[23] Saboori, A.; Hadjicostis, C. N., Current-state opacity formulations in probabilistic finite automata, IEEE Transactions on Automatic Control, 59, 1, 120-133 (2013) · Zbl 1360.68570
[24] Sankar, L.; Rajagopalan, S. R.; Poor, H. V., Utility-privacy tradeoffs in databases: An information-theoretic approach, IEEE Transactions on Information Forensics and Security, 8, 6, 838-852 (2013)
[25] Savas, Y.; Ornik, M.; Cubuktepe, M.; Topcu, U., Entropy maximization for constrained Markov decision processes, (56th Annual allerton conference on communication, control, and computing (Allerton) (2018), IEEE), 911-918
[26] Shannon, C. E., A mathematical theory of communication, Bell System Technical Journal, 27, 3, 379-423 (1948) · Zbl 1154.94303
[27] Tanaka, T.; Esfahani, P. M.; Mitter, S. K., LQG control with minimum directed information: Semidefinite programming approach, IEEE Transactions on Automatic Control, 63, 1, 37-52 (2017) · Zbl 1390.93871
[28] van der Vaart, A. W., Asymptotic statistics, Vol. 3 (2000), Cambridge University Press
[29] Vallée, R., Information entropy and state observation of a dynamical system, (International conference on information processing and management of uncertainty in knowledge-based systems (1986), Springer), 403-405
[30] Venkitasubramaniam, P.; Yao, J.; Pradhan, P., Information-theoretic security in stochastic control systems, Proceedings of the IEEE, 103, 10, 1914-1931 (2015)
[31] Yao, J.; Venkitasubramaniam, P., On the privacy-cost tradeoff of an in-home power storage mechanism, (51st Annual allerton conference on communication, control, and computing (Allerton) (2013), IEEE), 115-122
[32] Zidek, J. V.; van Eeden, C., Uncertainty, entropy, variance and the effect of partial information, (Lecture Notes-Monograph Series (2003), JSTOR), 155-167
[33] Zidek, R. A.; Kolmanovsky, I. V.; Bemporad, A., Stochastic MPC approach to drift counteraction, (American control conference (ACC) (2018), IEEE), 721-727
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.