×

Q-learning for estimating optimal dynamic treatment rules from observational data. (English. French summary) Zbl 1349.62371

Summary: The area of dynamic treatment regimes (DTR) aims to make inference about adaptive, multistage decision-making in clinical practice. A DTR is a set of decision rules, one per interval of treatment, where each decision is a function of treatment and covariate history that returns a recommended treatment. Q-learning is a popular method from the reinforcement learning literature that has recently been applied to estimate DTRs. While, in principle, Q-learning can be used for both randomized and observational data, the focus in the literature thus far has been exclusively on the randomized treatment setting. We extend the method to incorporate measured confounding covariates, using direct adjustment and a variety of propensity score approaches. The methods are examined under various settings including non-regular scenarios. We illustrate the methods in examining the effect of breastfeeding on vocabulary testing, based on data from the Promotion of Breastfeeding Intervention Trial.

MSC:

62L12 Sequential estimation
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

Matching; qLearn
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Anderson, Breast-feeding and cognitive development: A meta-analysis, American Journal of Clinical Nutrition 70 (4) pp 525– (1999)
[2] Arjas, Optimal dynamic regimes: Presenting a case for predictive inference, The International Journal of Biostatistics 6 (2010) · doi:10.2202/1557-4679.1204
[3] Bellman, Dynamic Programming (1957)
[4] Bertsekas, Neuro-Dynamic Programming (1996)
[5] Chakraborty, Dynamic treatment regimes for managing chronic health conditions: A statistical perspective, American Journal of Public Health 101 (1) pp 40– (2011) · doi:10.2105/AJPH.2010.198937
[6] Chakraborty , B. Laber , E. B. Zhao , Y. 2012 Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme · Zbl 1418.62182
[7] Chakraborty , B. Moodie , E. M. 2012 Estimating optimal dynamic treatment regimes with shared decision rules across stages: An extension of Q-learning
[8] Chakraborty, Inference for non-regular parameters in optimal dynamic treatment regimes, Statistical Methods in Medical Research 19 (3) pp 317– (2010) · Zbl 1365.62411 · doi:10.1177/0962280209105013
[9] Henderson, Regret-regression for optimal dynamic treatment regimes, Biometrics 6 pp 1192– (2010) · Zbl 1233.62180 · doi:10.1111/j.1541-0420.2009.01368.x
[10] Hernán, Comparison of dynamic treatment regimes via inverse probability weighting, Basic & Clinical Pharmacology & Toxicology 98 pp 237– (2006) · doi:10.1111/j.1742-7843.2006.pto_329.x
[11] Kramer, Breastfeeding and child cognitive development: New evidence from a large randomized trial, Archives of General Psychiatry 65 pp 578– (2008) · doi:10.1001/archpsyc.65.5.578
[12] Kramer, Promotion of breastfeeding intervention trial (PROBIT): A randomized trial in the Republic of Belarus, Journal of the American Medical Association 285 pp 413– (2001) · doi:10.1001/jama.285.4.413
[13] Kramer, Infant growth and health outcomes associated with 3 compared with 6 months of exclusive breastfeeding, American Journal of Clinical Nutrition 78 pp 291– (2003)
[14] Kramer, Breastfeeding and infant growth: Biology or bias?, Pediatrics 110 pp 343– (2002) · doi:10.1542/peds.110.2.343
[15] Kramer, Feeding effects on growth during infancy, Journal of Pediatrics 145 pp 600– (2004) · doi:10.1016/j.jpeds.2004.06.069
[16] Kramer, Effects of prolonged and exclusive breastfeeding on child height, weight, adiposity, and blood pressure at age 6.5 y: Evidence from a large randomized trial, American Journal of Clinical Nutrition 86 pp 1717– (2007)
[17] Laber , E. B. Qian , M. Lizotte , D. Murphy , S. 2012 Statistical inference in dynamic treatment regimes
[18] Moodie, Estimating optimal dynamic regimes: Correcting bias under the null, Scandinavian Journal of Statistics 37 pp 126– (2010) · Zbl 1224.62139 · doi:10.1111/j.1467-9469.2009.00661.x
[19] Murphy, Optimal dynamic treatment regimes (with discussions), Journal of the Royal Statistical Society, Series B 65 pp 331– (2003) · Zbl 1065.62006 · doi:10.1111/1467-9868.00389
[20] Murphy, A generalization error for Q-learning, Journal of Machine Learning Research 6 pp 1073– (2005) · Zbl 1222.68271
[21] Pineau, Constructing evidence-based treatment strategies using methods from computer science, Drug and Alcohol Dependence 88 pp S52– (2007) · doi:10.1016/j.drugalcdep.2007.01.005
[22] Robins, Statistical Models in Epidemiology: The Environment and Clinical Trials pp 95– (1999)
[23] Robins, Proceedings of the Second Seattle Symposium on Biostatistics pp 189– (2004) · Zbl 1279.62024 · doi:10.1007/978-1-4419-9076-1_11
[24] Robins, Marginal structural models and causal inference in epidemiology, Epidemiology 11 pp 550– (2000) · doi:10.1097/00001648-200009000-00011
[25] Robins, Estimation and extrapolation of optimal treatment and testing strategies, Statistics in Medicine 27 pp 4678– (2008) · doi:10.1002/sim.3301
[26] Rosenbaum, The central role of the propensity score in observational studies for causal effects, Biometrika 70 pp 41– (1983) · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[27] Sekhon, Multivariate and propensity score matching software with automated balance optimization: The matching package for R, Journal of Statistical Software 42 (7) pp 1– (2011) · doi:10.18637/jss.v042.i07
[28] Shortreed, Informing sequential clinical decision-making through reinforcement learning: An empirical study, Machine Learning 84 (1) pp 109– (2011) · Zbl 06031592 · doi:10.1007/s10994-010-5229-0
[29] Song , R. Wang , W. Zeng , D. Kosorok , M. 2012 Penalized Q-learning for dynamic treatment regimes
[30] Sutton, Reinforcement Learning: An Introduction (1998)
[31] Thall, Evaluating multiple treatment courses in clinical trials, Statistics in Medicine 30 pp 1011– (2000) · doi:10.1002/(SICI)1097-0258(20000430)19:8<1011::AID-SIM414>3.0.CO;2-M
[32] van der Laan, Causal effect models for realistic individualized treatment and intention to treat rules, The International Journal of Biostatistics 3 (2007) · Zbl 1165.62357 · doi:10.2202/1557-4679.1022
[33] Zhao, Reinforcement learning design for cancer clinical trials, Statistics in Medicine 28 pp 3294– (2009) · doi:10.1002/sim.3720
[34] Zhao, Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer, Biometrics 67 pp 1422– (2011) · Zbl 1274.62922 · doi:10.1111/j.1541-0420.2011.01572.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.