Beckenbach, Lukas; Osinenko, Pavel; Streif, Stefan A Q-learning predictive control scheme with guaranteed stability. (English) Zbl 07299569 Eur. J. Control 56, 167-178 (2020). MSC: 93B45 93D20 93C55 93C10 PDF BibTeX XML Cite \textit{L. Beckenbach} et al., Eur. J. Control 56, 167--178 (2020; Zbl 07299569) Full Text: DOI
Wang, Tao; Luo, Minna; Wang, Na; Cui, Lili Finite-time stochastic linear quadratic optimal control based on \(Q\)-learning. (Chinese. English summary) Zbl 07295674 J. Shenyang Norm. Univ., Nat. Sci. 38, No. 3, 207-213 (2020). MSC: 49K45 93E20 49N10 PDF BibTeX XML Cite \textit{T. Wang} et al., J. Shenyang Norm. Univ., Nat. Sci. 38, No. 3, 207--213 (2020; Zbl 07295674) Full Text: DOI
Ren, Jian; Liu, Jianwei; Yang, Pu Fault-tolerant tracking control for continuous flight control system based on reinforcement learning algorithm with incremental strategy. (Chinese. English summary) Zbl 07295085 Control Theory Appl. 37, No. 7, 1429-1438 (2020). MSC: 68T05 93C40 PDF BibTeX XML Cite \textit{J. Ren} et al., Control Theory Appl. 37, No. 7, 1429--1438 (2020; Zbl 07295085) Full Text: DOI
Su, Zhaopin; Li, Mohan; Zhang, Guofu; Liu, Yang Modeling and solving the repair crew scheduling for the damaged road networks based on Q-learning. (Chinese. English summary) Zbl 07294822 Acta Autom. Sin. 46, No. 7, 1467-1478 (2020). MSC: 90B36 90C40 PDF BibTeX XML Cite \textit{Z. Su} et al., Acta Autom. Sin. 46, No. 7, 1467--1478 (2020; Zbl 07294822) Full Text: DOI
Sayed, Wafaa S.; Gamal, Mostafa; Abdelrazek, Moemen; El-Tantawy, Samah Towards a learning style and knowledge level-based adaptive personalized platform for an effective and advanced learning for school students. (English) Zbl 07287552 Farouk, Mohamed Hesham (ed.) et al., Recent advances in engineering math and physics. Proceedings of the international conference, RAEMP 2019, Cairo, Egypt, December 24–26, 2019. Cham: Springer (ISBN 978-3-030-39846-0/hbk; 978-3-030-39847-7/ebook). 261-273 (2020). MSC: 68T 97U PDF BibTeX XML Cite \textit{W. S. Sayed} et al., in: Recent advances in engineering math and physics. Proceedings of the international conference, RAEMP 2019, Cairo, Egypt, December 24--26, 2019. Cham: Springer. 261--273 (2020; Zbl 07287552) Full Text: DOI
Zhu, Kun; Liu, Rong; Wang, Meiqing Reinforcement learning state and value function selection for portfolio optimization. (Chinese. English summary) Zbl 07266755 J. Fuzhou Univ., Nat. Sci. 48, No. 2, 146-151 (2020). MSC: 68T05 91G10 PDF BibTeX XML Cite \textit{K. Zhu} et al., J. Fuzhou Univ., Nat. Sci. 48, No. 2, 146--151 (2020; Zbl 07266755) Full Text: DOI
Li, Zhihui; Shi, Li; Yang, Lifang; Shang, Zhigang An adaptive learning rate Q-learning algorithm based on Kalman filter inspired by pigeon pecking-color learning. (English) Zbl 07240085 Pan, Linqiang (ed.) et al., Bio-inspired computing: theories and applications. 14th international conference, BIC-TA 2019, Zhengzhou, China, November 22–25, 2019. Revised selected papers. Part II. Singapore: Springer (ISBN 978-981-15-3414-0/pbk; 978-981-15-3415-7/ebook). Communications in Computer and Information Science 1160, 693-706 (2020). MSC: 68Q07 PDF BibTeX XML Cite \textit{Z. Li} et al., Commun. Comput. Inf. Sci. 1160, 693--706 (2020; Zbl 07240085) Full Text: DOI
Wang, Yongjie; Yao, Zhouzhou; Wang, Chao; Ren, Jiale; Chen, Qiao The impact of intelligent transportation points system based on Elo rating on emergence of cooperation at Y intersection. (English) Zbl 1433.90034 Appl. Math. Comput. 370, Article ID 124923, 16 p. (2020). MSC: 90B06 91A80 91A26 68T05 PDF BibTeX XML Cite \textit{Y. Wang} et al., Appl. Math. Comput. 370, Article ID 124923, 16 p. (2020; Zbl 1433.90034) Full Text: DOI
Mosadegh, H.; Fatemi Ghomi, S. M. T.; Süer, G. A. Stochastic mixed-model assembly line sequencing problem: mathematical modeling and Q-learning based simulated annealing hyper-heuristics. (English) Zbl 1430.90300 Eur. J. Oper. Res. 282, No. 2, 530-544 (2020). MSC: 90B36 90C11 90C15 90C59 PDF BibTeX XML Cite \textit{H. Mosadegh} et al., Eur. J. Oper. Res. 282, No. 2, 530--544 (2020; Zbl 1430.90300) Full Text: DOI
Rizvi, Syed Ali Asad; Lin, Zongli Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems. (English) Zbl 1451.93203 Int. J. Adapt. Control Signal Process. 33, No. 12, 1825-1842 (2019). MSC: 93C40 93B52 93C55 93C05 PDF BibTeX XML Cite \textit{S. A. A. Rizvi} and \textit{Z. Lin}, Int. J. Adapt. Control Signal Process. 33, No. 12, 1825--1842 (2019; Zbl 1451.93203) Full Text: DOI
Huang, Yunhan; Zhu, Quanyan Deceptive reinforcement learning under adversarial manipulations on cost signals. (English) Zbl 1440.68215 Alpcan, Tansu (ed.) et al., Decision and game theory for security. 10th international conference, GameSec 2019, Stockholm, Sweden, October 30 – November 1, 2019. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 11836, 217-237 (2019). MSC: 68T05 68M25 93C83 PDF BibTeX XML Cite \textit{Y. Huang} and \textit{Q. Zhu}, Lect. Notes Comput. Sci. 11836, 217--237 (2019; Zbl 1440.68215) Full Text: DOI
Kuhlmann, Renke Learning to steer nonlinear interior-point methods. (English) Zbl 1437.90148 EURO J. Comput. Optim. 7, No. 4, 381-419 (2019). MSC: 90C30 68T05 60J20 90C51 PDF BibTeX XML Cite \textit{R. Kuhlmann}, EURO J. Comput. Optim. 7, No. 4, 381--419 (2019; Zbl 1437.90148) Full Text: DOI
Zhang, Feng; Liu, Lingyun; Guo, Xinxin A multi-stage group decision model based on improved Q-learning. (Chinese. English summary) Zbl 1449.90232 Control Decis. 34, No. 9, 1917-1922 (2019). MSC: 90B50 68T05 PDF BibTeX XML Cite \textit{F. Zhang} et al., Control Decis. 34, No. 9, 1917--1922 (2019; Zbl 1449.90232) Full Text: DOI
Long, Mingkang; Su, Housheng; Wang, Xiaoling; Jiang, Guo-Ping; Wang, Xiaofan An iterative Q-learning based global consensus of discrete-time saturated multi-agent systems. (English) Zbl 1429.93336 Chaos 29, No. 10, 103127, 10 p. (2019). MSC: 93D50 93C55 93A16 93B47 PDF BibTeX XML Cite \textit{M. Long} et al., Chaos 29, No. 10, 103127, 10 p. (2019; Zbl 1429.93336) Full Text: DOI
Zhu, Wensheng; Zeng, Donglin; Song, Rui Proper inference for value function in high-dimensional Q-learning for dynamic treatment regimes. (English) Zbl 1428.62246 J. Am. Stat. Assoc. 114, No. 527, 1404-1417 (2019). MSC: 62H12 62C05 62F12 62J07 PDF BibTeX XML Cite \textit{W. Zhu} et al., J. Am. Stat. Assoc. 114, No. 527, 1404--1417 (2019; Zbl 1428.62246) Full Text: DOI
An, Liwei; Yang, Guang-Hong Data-based optimal denial-of-service attack scheduling against robust control based on Q-learning. (English) Zbl 1426.93056 Int. J. Robust Nonlinear Control 29, No. 15, 5178-5194 (2019). MSC: 93B35 94A62 93C83 PDF BibTeX XML Cite \textit{L. An} and \textit{G.-H. Yang}, Int. J. Robust Nonlinear Control 29, No. 15, 5178--5194 (2019; Zbl 1426.93056) Full Text: DOI
Halperin, Igor The QLBS Q-Learner goes NuQLear: fitted Q iteration, inverse RL, and option portfolios. (English) Zbl 1420.91463 Quant. Finance 19, No. 9, 1543-1553 (2019). MSC: 91G20 91-08 91G10 PDF BibTeX XML Cite \textit{I. Halperin}, Quant. Finance 19, No. 9, 1543--1553 (2019; Zbl 1420.91463) Full Text: DOI
Mu, Chaoxu; Zhao, Qian; Gao, Zhongke; Sun, Changyin Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. (English) Zbl 1418.93250 J. Franklin Inst. 356, No. 13, 6946-6967 (2019). MSC: 93D99 93A14 93C55 93D05 05C90 PDF BibTeX XML Cite \textit{C. Mu} et al., J. Franklin Inst. 356, No. 13, 6946--6967 (2019; Zbl 1418.93250) Full Text: DOI
Yang, Yongliang; Vamvoudakis, Kyriakos G.; Ferraz, Henrique; Modares, Hamidreza Dynamic intermittent Q-learning-based model-free suboptimal co-design of \(\mathcal{L}_2\). (English) Zbl 1418.93239 Int. J. Robust Nonlinear Control 29, No. 9, 2673-2694 (2019). MSC: 93D20 93B36 93C05 PDF BibTeX XML Cite \textit{Y. Yang} et al., Int. J. Robust Nonlinear Control 29, No. 9, 2673--2694 (2019; Zbl 1418.93239) Full Text: DOI
Rizvi, Syed Ali Asad; Lin, Zongli An iterative Q-learning scheme for the global stabilization of discrete-time linear systems subject to actuator saturation. (English) Zbl 1418.93223 Int. J. Robust Nonlinear Control 29, No. 9, 2660-2672 (2019). MSC: 93D15 93C55 93C05 93B52 PDF BibTeX XML Cite \textit{S. A. A. Rizvi} and \textit{Z. Lin}, Int. J. Robust Nonlinear Control 29, No. 9, 2660--2672 (2019; Zbl 1418.93223) Full Text: DOI
Moodie, Erica E. M.; Stephens, David A.; Alam, Shomoita; Zhang, Mei-jie; Logan, Brent; Arora, Mukta; Spellman, Stephen; Krakow, Elizabeth F. A cure-rate model for Q-learning: estimating an adaptive immunosuppressant treatment strategy for allogeneic hematopoietic cell transplant patients. (English) Zbl 1419.62412 Biom. J. 61, No. 2, 442-453 (2019). MSC: 62P10 62N05 PDF BibTeX XML Cite \textit{E. E. M. Moodie} et al., Biom. J. 61, No. 2, 442--453 (2019; Zbl 1419.62412) Full Text: DOI
Rokhlin, D. B. \(Q\)-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower. (English. Russian original) Zbl 07062745 Theory Probab. Appl. 64, No. 1, 41-58 (2019); translation from Teor. Veroyatn. Primen. 64, No. 1, 53-74 (2019). MSC: 60 68 PDF BibTeX XML Cite \textit{D. B. Rokhlin}, Theory Probab. Appl. 64, No. 1, 41--58 (2019; Zbl 07062745); translation from Teor. Veroyatn. Primen. 64, No. 1, 53--74 (2019) Full Text: DOI
Li, Yongqiang; Yang, Chengzan; Hou, Zhongsheng; Feng, Yuanjing; Yin, Chenkun Data-driven approximate Q-learning stabilization with optimality error bound analysis. (English) Zbl 1415.93219 Automatica 103, 435-442 (2019). MSC: 93D20 93B40 49N90 68T05 PDF BibTeX XML Cite \textit{Y. Li} et al., Automatica 103, 435--442 (2019; Zbl 1415.93219) Full Text: DOI
Qiao, Junfei; Wang, Gongming; Li, Wenjing; Chen, Min An adaptive deep Q-learning strategy for handwritten digit recognition. (English) Zbl 1434.68518 Neural Netw. 107, 61-71 (2018). MSC: 68T07 68T05 68T10 PDF BibTeX XML Cite \textit{J. Qiao} et al., Neural Netw. 107, 61--71 (2018; Zbl 1434.68518) Full Text: DOI
Jiang, Daniel R.; Powell, Warren B. Risk-averse approximate dynamic programming with quantile-based risk measures. (English) Zbl 1440.90084 Math. Oper. Res. 43, No. 2, 554-579 (2018). MSC: 90C39 62L20 91B06 93E35 PDF BibTeX XML Cite \textit{D. R. Jiang} and \textit{W. B. Powell}, Math. Oper. Res. 43, No. 2, 554--579 (2018; Zbl 1440.90084) Full Text: DOI
Kofinas, Panagiotis; Dounis, Anastasios I. Fuzzy Q-learning agent for online tuning of PID controller for DC motor speed control. (English) Zbl 07150342 Algorithms (Basel) 11, No. 10, Paper No. 148, 13 p. (2018). MSC: 93 90 PDF BibTeX XML Cite \textit{P. Kofinas} and \textit{A. I. Dounis}, Algorithms (Basel) 11, No. 10, Paper No. 148, 13 p. (2018; Zbl 07150342) Full Text: DOI
Forster, Richárd; Fülöp, Agnes Hierarchical clustering with deep q-learning. (English) Zbl 1412.68182 Acta Univ. Sapientiae, Inform. 10, No. 1, 86-109 (2018). MSC: 68T05 62H30 81-04 81V05 PDF BibTeX XML Cite \textit{R. Forster} and \textit{A. Fülöp}, Acta Univ. Sapientiae, Inform. 10, No. 1, 86--109 (2018; Zbl 1412.68182) Full Text: DOI
François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle An introduction to deep reinforcement learning. (English) Zbl 1448.68021 Found. Trends Mach. Learn. 11, No. 3-4, 1-145 (2018). Reviewer: Smaranda Belciug (Craiova) MSC: 68-02 68T05 68T07 PDF BibTeX XML Cite \textit{V. François-Lavet} et al., Found. Trends Mach. Learn. 11, No. 3--4, 1--145 (2018; Zbl 1448.68021) Full Text: DOI
Zhang, Baqun; Zhang, Min C-learning: a new classification framework to estimate optimal dynamic treatment regimes. (English) Zbl 1414.62485 Biometrics 74, No. 3, 891-899 (2018). MSC: 62P10 62H30 68T05 62L15 PDF BibTeX XML Cite \textit{B. Zhang} and \textit{M. Zhang}, Biometrics 74, No. 3, 891--899 (2018; Zbl 1414.62485) Full Text: DOI
Rizvi, Syed Ali Asad; Lin, Zongli Output feedback Q-learning for discrete-time linear zero-sum games with application to the \(H_\infty\) control. (English) Zbl 1402.93126 Automatica 95, 213-221 (2018). MSC: 93B52 93C55 93C05 93B36 91A05 68T05 93C95 PDF BibTeX XML Cite \textit{S. A. A. Rizvi} and \textit{Z. Lin}, Automatica 95, 213--221 (2018; Zbl 1402.93126) Full Text: DOI
Liao, Qin; Guo, Ying; Tu, Yifeng; Zhang, Hang Fidelity-based ant colony algorithm with Q-learning of quantum system. (English) Zbl 1394.81079 Int. J. Theor. Phys. 57, No. 3, 862-876 (2018). MSC: 81P68 68Q10 68Q32 68T05 68P10 PDF BibTeX XML Cite \textit{Q. Liao} et al., Int. J. Theor. Phys. 57, No. 3, 862--876 (2018; Zbl 1394.81079) Full Text: DOI
Shah, Suhail M.; Borkar, Vivek S. Q-learning for Markov decision processes with a satisfiability criterion. (English) Zbl 1386.93271 Syst. Control Lett. 113, 45-51 (2018). MSC: 93E03 90C40 68T05 PDF BibTeX XML Cite \textit{S. M. Shah} and \textit{V. S. Borkar}, Syst. Control Lett. 113, 45--51 (2018; Zbl 1386.93271) Full Text: DOI
Vamvoudakis, Kyriakos G.; Ferraz, Henrique Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance. (English) Zbl 1378.93083 Automatica 87, 412-420 (2018). MSC: 93C65 93B40 93C05 93D20 49N25 68T05 PDF BibTeX XML Cite \textit{K. G. Vamvoudakis} and \textit{H. Ferraz}, Automatica 87, 412--420 (2018; Zbl 1378.93083) Full Text: DOI
Jalalimanesh, Ammar; Shahabi Haghighi, Hamidreza; Ahmadi, Abbas; Soltani, Madjid Simulation-based optimization of radiotherapy: agent-based modeling and reinforcement learning. (English) Zbl 07313779 Math. Comput. Simul. 133, 235-248 (2017). MSC: 92 80 PDF BibTeX XML Cite \textit{A. Jalalimanesh} et al., Math. Comput. Simul. 133, 235--248 (2017; Zbl 07313779) Full Text: DOI
Xu, Xiangwei; Wei, Zhenchun; Feng, Lin; Zhang, Yan A Q-learning and TD error based task scheduling algorithm for sensor nodes. (Chinese. English summary) Zbl 1424.68023 J. Hefei Univ. Technol., Nat. Sci. 40, No. 4, 470-475, 521 (2017). MSC: 68M20 68T05 PDF BibTeX XML Cite \textit{X. Xu} et al., J. Hefei Univ. Technol., Nat. Sci. 40, No. 4, 470--475, 521 (2017; Zbl 1424.68023) Full Text: DOI
Yu, Naigong; Wang, Chen; Mo, Fanfan; Cai, Jianxian Dynamic environment path planning based on Q-learning algorithm and genetic algorithm. (Chinese. English summary) Zbl 1399.90284 J. Beijing Univ. Technol. 43, No. 7, 1009-1016 (2017). MSC: 90C35 68T40 68T05 68T20 90C59 PDF BibTeX XML Cite \textit{N. Yu} et al., J. Beijing Univ. Technol. 43, No. 7, 1009--1016 (2017; Zbl 1399.90284) Full Text: DOI
Han, Ke-Zhen; Feng, Jian; Cui, Xiaohong Fault-tolerant optimised tracking control for unknown discrete-time linear systems using a combined reinforcement learning and residual compensation methodology. (English) Zbl 1386.93088 Int. J. Syst. Sci., Princ. Appl. Syst. Integr. 48, No. 13, 2811-2825 (2017). MSC: 93B35 93C41 93C05 93C55 68T05 PDF BibTeX XML Cite \textit{K.-Z. Han} et al., Int. J. Syst. Sci., Princ. Appl. Syst. Integr. 48, No. 13, 2811--2825 (2017; Zbl 1386.93088) Full Text: DOI
Vamvoudakis, Kyriakos G. Q-learning for continuous-time graphical games on large networks with completely unknown linear system dynamics. (English) Zbl 1386.93035 Int. J. Robust Nonlinear Control 27, No. 16, 2900-2920 (2017). MSC: 93A15 93C41 68T05 PDF BibTeX XML Cite \textit{K. G. Vamvoudakis}, Int. J. Robust Nonlinear Control 27, No. 16, 2900--2920 (2017; Zbl 1386.93035) Full Text: DOI
Radac, Mircea-Bogdan; Precup, Radu-Emil; Roman, Raul-Cristian Model-free control performance improvement using virtual reference feedback tuning and reinforcement Q-learning. (English) Zbl 1362.93062 Int. J. Syst. Sci., Princ. Appl. Syst. Integr. 48, No. 5, 1071-1083 (2017). MSC: 93B52 68T05 93C95 93C55 93C10 PDF BibTeX XML Cite \textit{M.-B. Radac} et al., Int. J. Syst. Sci., Princ. Appl. Syst. Integr. 48, No. 5, 1071--1083 (2017; Zbl 1362.93062) Full Text: DOI
Vamvoudakis, Kyriakos G. Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. (English) Zbl 1356.93044 Syst. Control Lett. 100, 14-20 (2017). MSC: 93C40 49M30 93C41 93C55 PDF BibTeX XML Cite \textit{K. G. Vamvoudakis}, Syst. Control Lett. 100, 14--20 (2017; Zbl 1356.93044) Full Text: DOI
Dimirovski, Georgi M. Learning intelligent controls in high speed networks: synergies of computational intelligence with control and Q-learning theories. (English) Zbl 1402.68026 Sgurev, Vassil (ed.) et al., Innovative issues in intelligent systems. Cham: Springer (ISBN 978-3-319-27266-5/hbk; 978-3-319-27267-2/ebook). Studies in Computational Intelligence 623, 111-139 (2016). MSC: 68M10 68M20 68T05 PDF BibTeX XML Cite \textit{G. M. Dimirovski}, Stud. Comput. Intell. 623, 111--139 (2016; Zbl 1402.68026) Full Text: DOI
Varela, Martín; Viera, Omar; Robledo, Franco A Q-learning approach for investment decisions. (English) Zbl 1407.91238 Pinto, Alberto A. (ed.) et al., Trends in mathematical economics. Dialogues between Southern Europe and Latin America. Selected papers based on the presentations at the conferences: 3rd international conference on dynamics, games and science, DGS III, on the occasion of the 50th birthday of Alberto A. Pinto, Porto, Portugal, February 17–21, 2014, the 1st Hellenic-Portuguese meeting on mathematical economics, AUEB, Athens, Greece, and the XV Jornadas Latinoamericanas de Teoría Económica, JOLATE, Guanajuato, México. Cham: Springer. 347-368 (2016). MSC: 91G10 68T05 PDF BibTeX XML Cite \textit{M. Varela} et al., in: Trends in mathematical economics. Dialogues between Southern Europe and Latin America. Selected papers based on the presentations at the conferences: 3rd international conference on dynamics, games and science, DGS III, on the occasion of the 50th birthday of Alberto A. Pinto, Porto, Portugal, February 17--21, 2014, the 1st Hellenic-Portuguese meeting on mathematical economics, AUEB, Athens, Greece, and the XV Jornadas Latinoamericanas de Teoría Económica, JOLATE, Guanajuato, México. Cham: Springer. 347--368 (2016; Zbl 1407.91238) Full Text: DOI
Yu, Naigong; Mo, Fanfan Mobile robot path planning based on deep auto-encoder and \(Q\)-learning. (Chinese. English summary) Zbl 1363.68196 J. Beijing Univ. Technol. 42, No. 5, 668-673 (2016). MSC: 68T40 68T05 93C85 PDF BibTeX XML Cite \textit{N. Yu} and \textit{F. Mo}, J. Beijing Univ. Technol. 42, No. 5, 668--673 (2016; Zbl 1363.68196) Full Text: DOI
Chakraborty, Bibhas; Ghosh, Palash; Moodie, Erica E. M.; Rush, A. John Estimating optimal shared-parameter dynamic regimens with application to a multistage depression clinical trial. (English) Zbl 1390.62239 Biometrics 72, No. 3, 865-876 (2016). MSC: 62P10 62C12 62J02 PDF BibTeX XML Cite \textit{B. Chakraborty} et al., Biometrics 72, No. 3, 865--876 (2016; Zbl 1390.62239) Full Text: DOI
Bhatnagar, Shalabh; Lakshmanan, K. Multiscale Q-learning with linear function approximation. (English) Zbl 1346.93265 Discrete Event Dyn. Syst. 26, No. 3, 477-509 (2016). MSC: 93C70 93B40 93E03 68T05 PDF BibTeX XML Cite \textit{S. Bhatnagar} and \textit{K. Lakshmanan}, Discrete Event Dyn. Syst. 26, No. 3, 477--509 (2016; Zbl 1346.93265) Full Text: DOI
Abdallah, Sherief; Kaisers, Michael Addressing environment non-stationarity by repeating Q-learning updates. (English) Zbl 1360.68661 J. Mach. Learn. Res. 17, Paper No. 46, 31 p. (2016). MSC: 68T05 PDF BibTeX XML Cite \textit{S. Abdallah} and \textit{M. Kaisers}, J. Mach. Learn. Res. 17, Paper No. 46, 31 p. (2016; Zbl 1360.68661) Full Text: Link
Gershman, Samuel J. Empirical priors for reinforcement learning models. (English) Zbl 1359.62500 J. Math. Psychol. 71, 1-6 (2016). MSC: 62P15 62F15 91E40 PDF BibTeX XML Cite \textit{S. J. Gershman}, J. Math. Psychol. 71, 1--6 (2016; Zbl 1359.62500) Full Text: DOI
Zhang, Jilie; Zhang, Huaguang; Wang, Binrui; Cai, Tiaoyang Nearly data-based optimal control for linear discrete model-free systems with delays via reinforcement learning. (English) Zbl 1333.93270 Int. J. Syst. Sci., Princ. Appl. Syst. Integr. 47, No. 7, 1563-1573 (2016). MSC: 93E20 93E10 93C05 93C55 68T05 PDF BibTeX XML Cite \textit{J. Zhang} et al., Int. J. Syst. Sci., Princ. Appl. Syst. Integr. 47, No. 7, 1563--1573 (2016; Zbl 1333.93270) Full Text: DOI
Aziz Khater, A.; El-Bardini, Mohammad; El-Rabaie, Nabila M. Embedded adaptive fuzzy controller based on reinforcement learning for DC motor with flexible shaft. (English) Zbl 1390.93475 Arab. J. Sci. Eng. 40, No. 8, 2389-2406 (2015). MSC: 93C42 PDF BibTeX XML Cite \textit{A. Aziz Khater} et al., Arab. J. Sci. Eng. 40, No. 8, 2389--2406 (2015; Zbl 1390.93475) Full Text: DOI
Lee, Juhee; Thall, Peter F.; Ji, Yuan; Müller, Peter Bayesian dose-finding in two treatment cycles based on the joint utility of efficacy and toxicity. (English) Zbl 1373.62547 J. Am. Stat. Assoc. 110, No. 510, 711-722 (2015). MSC: 62P10 62K05 62F35 PDF BibTeX XML Cite \textit{J. Lee} et al., J. Am. Stat. Assoc. 110, No. 510, 711--722 (2015; Zbl 1373.62547) Full Text: DOI
Zhao, Ying-Qi; Zeng, Donglin; Laber, Eric B.; Kosorok, Michael R. New statistical learning methods for estimating optimal dynamic treatment regimes. (English) Zbl 1373.62557 J. Am. Stat. Assoc. 110, No. 510, 583-598 (2015). MSC: 62P10 68T05 62G20 PDF BibTeX XML Cite \textit{Y.-Q. Zhao} et al., J. Am. Stat. Assoc. 110, No. 510, 583--598 (2015; Zbl 1373.62557) Full Text: DOI
Tang, Hao; Xu, Lingling; Sun, Jing; Chen, Yingjun; Zhou, Lei Modeling and optimization control of a demand-driven, conveyor-serviced production station. (English) Zbl 1346.90314 Eur. J. Oper. Res. 243, No. 3, 839-851 (2015). MSC: 90B30 90C40 PDF BibTeX XML Cite \textit{H. Tang} et al., Eur. J. Oper. Res. 243, No. 3, 839--851 (2015; Zbl 1346.90314) Full Text: DOI
Wang, Yufang; Yan, Hongsen Adaptive dynamic scheduling strategy in knowledgeable manufacturing based on improved Q-learning. (Chinese. English summary) Zbl 1349.90448 Control Decis. 30, No. 11, 1930-1936 (2015). MSC: 90B36 68T05 PDF BibTeX XML Cite \textit{Y. Wang} and \textit{H. Yan}, Control Decis. 30, No. 11, 1930--1936 (2015; Zbl 1349.90448) Full Text: DOI
Wallace, Michael P.; Moodie, Erica E. M. Doubly-robust dynamic treatment regimen estimation via weighted least squares. (English) Zbl 1419.62467 Biometrics 71, No. 3, 636-644 (2015). MSC: 62P10 92C50 PDF BibTeX XML Cite \textit{M. P. Wallace} and \textit{E. E. M. Moodie}, Biometrics 71, No. 3, 636--644 (2015; Zbl 1419.62467) Full Text: DOI
Zhao, Qiming; Xu, Hao; Sarangapani, Jagannathan Finite-horizon near optimal adaptive control of uncertain linear discrete-time systems. (English) Zbl 1333.93149 Optim. Control Appl. Methods 36, No. 6, 853-872 (2015). MSC: 93C40 93C05 93C41 93C55 90C39 PDF BibTeX XML Cite \textit{Q. Zhao} et al., Optim. Control Appl. Methods 36, No. 6, 853--872 (2015; Zbl 1333.93149) Full Text: DOI
Cheung, Ying Kuen; Chakraborty, Bibhas; Davidson, Karina W. Sequential multiple assignment randomized trial (SMART) with adaptive randomization for quality improvement in depression treatment program. (English) Zbl 1390.62244 Biometrics 71, No. 2, 450-459 (2015). MSC: 62P10 62K05 PDF BibTeX XML Cite \textit{Y. K. Cheung} et al., Biometrics 71, No. 2, 450--459 (2015; Zbl 1390.62244) Full Text: DOI
Vamvoudakis, Kyriakos G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. (English) Zbl 1336.91022 Automatica 61, 274-281 (2015). MSC: 91A23 91A06 91A10 68T05 91A26 93C40 PDF BibTeX XML Cite \textit{K. G. Vamvoudakis}, Automatica 61, 274--281 (2015; Zbl 1336.91022) Full Text: DOI
Yau, Kok-Lim Alvin; Goh, Hock Guan; Chieng, David; Kwong, Kae Hsiang Application of reinforcement learning to wireless sensor networks: models and algorithms. (English) Zbl 1343.68211 Computing 97, No. 11, 1045-1075 (2015). MSC: 68T05 PDF BibTeX XML Cite \textit{K.-L. A. Yau} et al., Computing 97, No. 11, 1045--1075 (2015; Zbl 1343.68211) Full Text: DOI
Song, Rui; Wang, Weiwei; Zeng, Donglin; Kosorok, Michael R. Penalized Q-learning for dynamic treatment regimens. (English) Zbl 1415.62054 Stat. Sin. 25, No. 3, 901-920 (2015). MSC: 62J07 62F12 62P10 PDF BibTeX XML Cite \textit{R. Song} et al., Stat. Sin. 25, No. 3, 901--920 (2015; Zbl 1415.62054) Full Text: DOI
Shen, Yun; Tobia, Michael J.; Sommer, Tobias; Obermayer, Klaus Risk-sensitive reinforcement learning. (English) Zbl 1410.91175 Neural Comput. 26, No. 7, 1298-1328 (2014). MSC: 91B06 91B16 91E40 92C20 92C55 PDF BibTeX XML Cite \textit{Y. Shen} et al., Neural Comput. 26, No. 7, 1298--1328 (2014; Zbl 1410.91175) Full Text: DOI
Raeisy, Behrooz; Haghighi, Shapoor Golbahar; Safavi, Ali Akbar Active noise control system via multi-agent credit assignment. (English) Zbl 1305.93124 J. Intell. Fuzzy Syst. 26, No. 2, 1051-1063 (2014). MSC: 93C42 93C85 PDF BibTeX XML Cite \textit{B. Raeisy} et al., J. Intell. Fuzzy Syst. 26, No. 2, 1051--1063 (2014; Zbl 1305.93124) Full Text: DOI
Zhu, Meiqiang; Li, Ming; Cheng, Yuhu; Zhang, Qian; Wang, Xuesong A heuristically accelerated Q-learning algorithm based on Laplacian eigenmap. (Chinese. English summary) Zbl 1313.68151 Control Decis. 29, No. 3, 425-430 (2014). MSC: 68T05 68T20 PDF BibTeX XML Cite \textit{M. Zhu} et al., Control Decis. 29, No. 3, 425--430 (2014; Zbl 1313.68151) Full Text: DOI
Goldberg, Yair; Song, Rui; Kosorok, Michael R. Adaptive \(Q\)-learning. (English) Zbl 1325.62073 Banerjee, M. (ed.) et al., From probability to statistics and back: high-dimensional models and processes. A Festschrift in honor of Jon A. Wellner. Including papers from the conference, Seattle, WA, USA, July 28–31, 2010. Beachwood, OH: IMS, Institute of Mathematical Statistics (ISBN 978-0-940600-83-6). Institute of Mathematical Statistics Collections 9, 150-162 (2013). MSC: 62G05 62G20 62F12 PDF BibTeX XML Cite \textit{Y. Goldberg} et al., in: From probability to statistics and back: high-dimensional models and processes. A Festschrift in honor of Jon A. Wellner. Including papers from the conference, Seattle, WA, USA, July 28--31, 2010. Beachwood, OH: IMS, Institute of Mathematical Statistics. 150--162 (2013; Zbl 1325.62073) Full Text: DOI
Fernandez-Gauna, Borja; Marques, Ion; Graña, Manuel Undesired state-action prediction in multi-agent reinforcement learning for linked multi-component robotic system control. (English) Zbl 1293.93551 Inf. Sci. 232, 309-324 (2013). MSC: 93C85 68T05 68T40 PDF BibTeX XML Cite \textit{B. Fernandez-Gauna} et al., Inf. Sci. 232, 309--324 (2013; Zbl 1293.93551) Full Text: DOI
Yu, Huizhen; Bertsekas, Dimitri P. On boundedness of Q-learning iterates for stochastic shortest path problems. (English) Zbl 1291.90296 Math. Oper. Res. 38, No. 2, 209-227 (2013). MSC: 90C40 93E20 90C39 68W15 62L20 PDF BibTeX XML Cite \textit{H. Yu} and \textit{D. P. Bertsekas}, Math. Oper. Res. 38, No. 2, 209--227 (2013; Zbl 1291.90296) Full Text: DOI
Chen, Xin; Chen, Gang; Cao, Weihua; Wu, Min Cooperative learning with joint state value approximation for multi-agent systems. (English) Zbl 1299.93001 J. Control Theory Appl. 11, No. 2, 149-155 (2013). MSC: 93A14 93C85 68T05 68T42 PDF BibTeX XML Cite \textit{X. Chen} et al., J. Control Theory Appl. 11, No. 2, 149--155 (2013; Zbl 1299.93001) Full Text: DOI
Zhang, Baqun; Tsiatis, Anastasios A.; Laber, Eric B.; Davidian, Marie Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. (English) Zbl 1284.62508 Biometrika 100, No. 3, 681-694 (2013). MSC: 62L10 62C99 62P10 68T05 62G35 65C60 PDF BibTeX XML Cite \textit{B. Zhang} et al., Biometrika 100, No. 3, 681--694 (2013; Zbl 1284.62508) Full Text: DOI
Chakraborty, Bibhas; Laber, Eric B.; Zhao, Yingqi Inference for optimal dynamic treatment regimes using an adaptive \(m\)-out-of-\(n\) bootstrap scheme. (English) Zbl 1418.62182 Biometrics 69, No. 3, 714-723 (2013). MSC: 62G09 62G05 62E20 62G15 65C60 PDF BibTeX XML Cite \textit{B. Chakraborty} et al., Biometrics 69, No. 3, 714--723 (2013; Zbl 1418.62182) Full Text: DOI
Yu, Huizhen; Bertsekas, Dimitri P. Q-learning and policy iteration algorithms for stochastic shortest path problems. (English) Zbl 1306.90171 Ann. Oper. Res. 208, 95-132 (2013). MSC: 90C40 90C39 PDF BibTeX XML Cite \textit{H. Yu} and \textit{D. P. Bertsekas}, Ann. Oper. Res. 208, 95--132 (2013; Zbl 1306.90171) Full Text: DOI
Yogeswaran, Mohan; Ponnambalam, S. G. Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task. (English) Zbl 1353.68244 Opsearch 49, No. 3, 223-236 (2012). MSC: 68T05 PDF BibTeX XML Cite \textit{M. Yogeswaran} and \textit{S. G. Ponnambalam}, Opsearch 49, No. 3, 223--236 (2012; Zbl 1353.68244) Full Text: DOI
Beck, C. L.; Srikant, R. Error bounds for constant step-size \(Q\)-learning. (English) Zbl 1255.93129 Syst. Control Lett. 61, No. 12, 1203-1208 (2012). MSC: 93E03 68T05 60J20 PDF BibTeX XML Cite \textit{C. L. Beck} and \textit{R. Srikant}, Syst. Control Lett. 61, No. 12, 1203--1208 (2012; Zbl 1255.93129) Full Text: DOI
Li, Xueping; Wang, Jiao; Sawhney, Rapinder Reinforcement learning for joint pricing, lead-time and scheduling decisions in make-to-order systems. (English) Zbl 1253.90113 Eur. J. Oper. Res. 221, No. 1, 99-109 (2012). MSC: 90B35 90B05 90C40 68M20 PDF BibTeX XML Cite \textit{X. Li} et al., Eur. J. Oper. Res. 221, No. 1, 99--109 (2012; Zbl 1253.90113) Full Text: DOI
Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho Integral \(Q\)-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. (English) Zbl 1254.49019 Automatica 48, No. 11, 2850-2859 (2012). MSC: 49N10 68T05 49M30 PDF BibTeX XML Cite \textit{J. Y. Lee} et al., Automatica 48, No. 11, 2850--2859 (2012; Zbl 1254.49019) Full Text: DOI
Bertsekas, Dimitri P.; Yu, Huizhen Q-learning and enhanced policy iteration in discounted dynamic programming. (English) Zbl 1243.90231 Math. Oper. Res. 37, No. 1, 66-94 (2012). MSC: 90C40 93E20 90C39 68W15 62L20 PDF BibTeX XML Cite \textit{D. P. Bertsekas} and \textit{H. Yu}, Math. Oper. Res. 37, No. 1, 66--94 (2012; Zbl 1243.90231) Full Text: DOI
Polson, Nicholas G.; Sorensen, Morten A simulation-based approach to stochastic dynamic programming. (English) Zbl 1277.90144 Appl. Stoch. Models Bus. Ind. 27, No. 2, 151-163 (2011). MSC: 90C39 91B70 PDF BibTeX XML Cite \textit{N. G. Polson} and \textit{M. Sorensen}, Appl. Stoch. Models Bus. Ind. 27, No. 2, 151--163 (2011; Zbl 1277.90144) Full Text: DOI
Chen, Chunlin; Dong, Daoyi; Li, Han-Xiong; Tarn, Tzyh-Jong Hybrid MDP based integrated hierarchical Q-learning. (English) Zbl 1267.68177 Sci. China, Inf. Sci. 54, No. 11, 2279-2294 (2011). MSC: 68T05 68Q32 PDF BibTeX XML Cite \textit{C. Chen} et al., Sci. China, Inf. Sci. 54, No. 11, 2279--2294 (2011; Zbl 1267.68177) Full Text: DOI
Zhao, Yufan; Zeng, Donglin; Socinski, Mark A.; Kosorok, Michael R. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. (English) Zbl 1274.62922 Biometrics 67, No. 4, 1422-1433 (2011). MSC: 62P10 92C50 PDF BibTeX XML Cite \textit{Y. Zhao} et al., Biometrics 67, No. 4, 1422--1433 (2011; Zbl 1274.62922) Full Text: DOI
Yao, Minghai; Qu, Xinyu; Li, Jiahe; Gu, Qinlong; Tang, Liping A study on a \(Q\)-learning algorithm based on ART 2. (Chinese. English summary) Zbl 1240.68230 Control Decis. 26, No. 2, 227-232 (2011). MSC: 68T05 PDF BibTeX XML Cite \textit{M. Yao} et al., Control Decis. 26, No. 2, 227--232 (2011; Zbl 1240.68230)
Ke, Wen-De; Piao, Song-Hao; Peng, Zhi-Ping; Cai, Ze-Su; Yuan, Quan-De Cooperative Q learning method based on \(\pi \) calculus in robot soccer. (Chinese. English summary) Zbl 1215.68199 J. Comput. Appl. 31, No. 3, 654-656 (2011). MSC: 68T05 68T27 68T40 PDF BibTeX XML Cite \textit{W.-D. Ke} et al., J. Comput. Appl. 31, No. 3, 654--656 (2011; Zbl 1215.68199) Full Text: DOI
Hirashima, Yoichi A new rearrangement plan for freight cars in a train. Q-learning for minimizing the movement counts of freight cars. (English) Zbl 1209.90036 Ao, Sio-Iong (ed.) et al., Intelligent control and computer engineering. Selected papers based on the presentations at the international conference on advances in intelligent control and computer engineering, Hong Kong, China, March 17–19, 2010. New York, NY: Springer (ISBN 978-94-007-0285-1/hbk; 978-94-007-0286-8/ebook). Lecture Notes in Electrical Engineering 70, 107-118 (2011). MSC: 90B06 90C90 PDF BibTeX XML Cite \textit{Y. Hirashima}, Lect. Notes Electr. Eng. 70, 107--118 (2011; Zbl 1209.90036) Full Text: DOI
Kunnumkal, Sumit; Topaloglu, Huseyin A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm. (English) Zbl 06859937 ACM Trans. Model. Comput. Simul. 20, No. 3, Article No. 12, 26 p. (2010). MSC: 62L20 68T05 PDF BibTeX XML Cite \textit{S. Kunnumkal} and \textit{H. Topaloglu}, ACM Trans. Model. Comput. Simul. 20, No. 3, Article No. 12, 26 p. (2010; Zbl 06859937) Full Text: DOI
Chen, Yangzhou; Huang, Xu; Dai, Guiping Cooperative hunting strategy of multiple mobile robots based on new state partition. (Chinese. English summary) Zbl 1240.68404 J. Beijing Univ. Technol. 36, No. 8, 1031-1036 (2010). MSC: 68T40 93C85 PDF BibTeX XML Cite \textit{Y. Chen} et al., J. Beijing Univ. Technol. 36, No. 8, 1031--1036 (2010; Zbl 1240.68404)
Guo, Hongliang; Meng, Yan Distributed reinforcement learning for coordinate multi-robot foraging. (English) Zbl 1203.68238 J. Intell. Robot. Syst. 60, No. 3-4, 531-551 (2010). MSC: 68T40 68T05 PDF BibTeX XML Cite \textit{H. Guo} and \textit{Y. Meng}, J. Intell. Robot. Syst. 60, No. 3--4, 531--551 (2010; Zbl 1203.68238) Full Text: DOI
Kim, J.-H.; Lewis, F. L. Model-free \(H_{\infty }\) control design for unknown linear discrete-time systems via Q-learning with LMI. (English) Zbl 1205.93046 Automatica 46, No. 8, 1320-1326 (2010). MSC: 93B51 93C55 68T05 93C05 93B36 PDF BibTeX XML Cite \textit{J. H. Kim} and \textit{F. L. Lewis}, Automatica 46, No. 8, 1320--1326 (2010; Zbl 1205.93046) Full Text: DOI
Vincze, Dávid; Kovács, Szilveszter Incremental rule base creation with fuzzy rule interpolation-based Q-learning. (English) Zbl 1209.68431 Rudas, Imre J.(ed) et al., Computational intelligence and informatics. Selected papers based on the presentations at the 10th international symposium of Hungarian researchers on computational intelligence and informatics, Budapest, Hungary, November 12–14, 2009. Berlin: Springer (ISBN 978-3-642-15219-1/hbk; 978-3-642-15220-7/ebook). Studies in Computational Intelligence 313, 191-203 (2010). MSC: 68T05 PDF BibTeX XML Cite \textit{D. Vincze} and \textit{S. Kovács}, Stud. Comput. Intell. 313, 191--203 (2010; Zbl 1209.68431) Full Text: DOI
Langlois, Marina; Sloan, Robert H. Reinforcement learning via approximation of the Q-function. (English) Zbl 1213.68498 J. Exp. Theor. Artif. Intell. 22, No. 3, 219-235 (2010). MSC: 68T05 PDF BibTeX XML Cite \textit{M. Langlois} and \textit{R. H. Sloan}, J. Exp. Theor. Artif. Intell. 22, No. 3, 219--235 (2010; Zbl 1213.68498) Full Text: DOI
Berger, Michael; Rosenschein, Jeffrey S. When to apply the fifth commandment: the effects of parenting on genetic and learning agents. (English) Zbl 1213.68484 J. Exp. Theor. Artif. Intell. 22, No. 3, 159-195 (2010). MSC: 68T05 68W05 PDF BibTeX XML Cite \textit{M. Berger} and \textit{J. S. Rosenschein}, J. Exp. Theor. Artif. Intell. 22, No. 3, 159--195 (2010; Zbl 1213.68484) Full Text: DOI
Szepesvári, Csaba Algorithms for reinforcement learning. (English) Zbl 1205.68320 Synthesis Lectures on Artificial Intelligence and Machine Learning 9. San Rafael, CA: Morgan & Claypool Publishers (ISBN 978-1-60845-492-1/pbk; 978-1-60845-493-8/ebook). xiii, 89 p. (2010). Reviewer: Klaus Dohmen (Mittweida) MSC: 68T05 68W05 68-02 90C39 90C40 PDF BibTeX XML Cite \textit{C. Szepesvári}, Algorithms for reinforcement learning. San Rafael, CA: Morgan \& Claypool Publishers (2010; Zbl 1205.68320) Full Text: DOI
Zhang, Yifeng; Bhattacharyya, Siddhartha; Li, Xiaoming From choice of procurement strategy to supply network configuration: an evolutionary approach. (English) Zbl 1183.90081 Int. J. Inf. Technol. Decis. Mak. 9, No. 1, 145-173 (2010). MSC: 90B10 90B06 90C59 PDF BibTeX XML Cite \textit{Y. Zhang} et al., Int. J. Inf. Technol. Decis. Mak. 9, No. 1, 145--173 (2010; Zbl 1183.90081) Full Text: DOI
Wang, Zhongwei; Cao, Qixin; Luan, Nan; Zhang, Lei Reactive self-rescue control for autonomous mobile robot based on reinforcement learning. (Chinese. English summary) Zbl 1212.93233 J. Shanghai Jiaotong Univ. (Chin. Ed.) 43, No. 11, 1751-1755 (2009). MSC: 93C85 68T40 68T05 PDF BibTeX XML Cite \textit{Z. Wang} et al., J. Shanghai Jiaotong Univ. (Chin. Ed.) 43, No. 11, 1751--1755 (2009; Zbl 1212.93233)
Feng, Zhiyong; Liang, Litao; Tan, Li; Zhang, Ping Q-learning based heterogeneous network self-optimization for reconfigurable network with CPC assistance. (English) Zbl 1181.68193 Sci. China, Ser. F. 52, No. 12, 2360-2368 (2009). MSC: 68T05 94A12 68M10 PDF BibTeX XML Cite \textit{Z. Feng} et al., Sci. China, Ser. F 52, No. 12, 2360--2368 (2009; Zbl 1181.68193) Full Text: DOI
Bonarini, Andrea; Lazaric, Alessandro; Montrone, Francesco; Restelli, Marcello Reinforcement distribution in fuzzy Q-learning. (English) Zbl 1187.68379 Fuzzy Sets Syst. 160, No. 10, 1420-1443 (2009). MSC: 68T05 PDF BibTeX XML Cite \textit{A. Bonarini} et al., Fuzzy Sets Syst. 160, No. 10, 1420--1443 (2009; Zbl 1187.68379) Full Text: DOI
Bhatnagar, Shalabh; Babu, K. Mohan New algorithms of the Q-learning type. (English) Zbl 1283.93328 Automatica 44, No. 4, 1111-1119 (2008). MSC: 93E35 68T05 PDF BibTeX XML Cite \textit{S. Bhatnagar} and \textit{K. M. Babu}, Automatica 44, No. 4, 1111--1119 (2008; Zbl 1283.93328) Full Text: DOI
Kunnumkal, Sumit; Topaloglu, Huseyin Exploiting the structural properties of the underlying Markov decision problem in the Q-learning algorithm. (English) Zbl 1243.90235 INFORMS J. Comput. 20, No. 2, 288-301 (2008). MSC: 90C40 PDF BibTeX XML Cite \textit{S. Kunnumkal} and \textit{H. Topaloglu}, INFORMS J. Comput. 20, No. 2, 288--301 (2008; Zbl 1243.90235) Full Text: DOI
Liu, Yunlong; Li, Renhou; Liu, Jianshu A Q-learning algorithm based on predictive state representations. (Chinese. English summary) Zbl 1199.68408 J. Xi’an Jiaotong Univ. 42, No. 12, 1472-1475, 1485 (2008). MSC: 68T37 68T05 68W05 68T20 PDF BibTeX XML Cite \textit{Y. Liu} et al., J. Xi'an Jiaotong Univ. 42, No. 12, 1472--1475, 1485 (2008; Zbl 1199.68408)
Jiang, Jianguo; Su, Zhaopin; Zhang, Guofu; Xia, Na Agent-behavior strategy in serial multi-task coalition formation. (Chinese. English summary) Zbl 1199.91011 Control Theory Appl. 25, No. 5, 853-858 (2008). MSC: 91A06 91A12 PDF BibTeX XML Cite \textit{J. Jiang} et al., Control Theory Appl. 25, No. 5, 853--858 (2008; Zbl 1199.91011)
Waltman, Ludo; Kaymak, Uzay Q-learning agents in a Cournot oligopoly model. (English) Zbl 1181.91040 J. Econ. Dyn. Control 32, No. 10, 3275-3293 (2008). MSC: 91A26 68T05 68T42 91B99 PDF BibTeX XML Cite \textit{L. Waltman} and \textit{U. Kaymak}, J. Econ. Dyn. Control 32, No. 10, 3275--3293 (2008; Zbl 1181.91040) Full Text: DOI
Wang, Chao; Guo, Jing; Bao, Zhen-qiang Application of improved Q learning algorithm to job shop problem. (Chinese. English summary) Zbl 1171.68684 J. Comput. Appl. 28, No. 12, 3268-3270 (2008). MSC: 68T05 PDF BibTeX XML Cite \textit{C. Wang} et al., J. Comput. Appl. 28, No. 12, 3268--3270 (2008; Zbl 1171.68684) Full Text: DOI
Boubertakh, Hamid; Tadjine, Mohamed; Glorennec, Pierre-Yves; Labiod, Salim Optimization of fuzzy PID controllers using Q-learning algorithm. (English) Zbl 1159.93337 Arch. Control Sci. 18, No. 4, 415-435 (2008). MSC: 93C42 68T05 93B40 PDF BibTeX XML Cite \textit{H. Boubertakh} et al., Arch. Control Sci. 18, No. 4, 415--435 (2008; Zbl 1159.93337)
Zeng, Qingcheng; Yang, Zhongzhen A scheduling model and \(Q\)-learning algorithm for yard trailers at container terminals. (Chinese. English summary) Zbl 1164.90340 J. Harbin Eng. Univ. 29, No. 1, 1-4 (2008). MSC: 90B36 68T05 PDF BibTeX XML Cite \textit{Q. Zeng} and \textit{Z. Yang}, J. Harbin Eng. Univ. 29, No. 1, 1--4 (2008; Zbl 1164.90340)