×

zbMATH — the first resource for mathematics

The practical utility of incorporating model selection uncertainty into prognostic models for survival data. (English) Zbl 1071.62096
Summary: Predictions of disease outcome in prognostic factor models are usually based on one selected model. However, often several models fit the data equally well, but these models might differ substantially in terms of included explanatory variables and might lead to different predictions for individual patients. For survival data, we discuss two approaches to account for model selection uncertainty in two data examples, with the main emphasis on variable selection in a proportional hazard Cox model [D. R. Cox, J. R. Stat. Soc., Ser. B 34, 187–220 (1972; Zbl 0243.62041)]. The main aim of our investigation is to establish the ways in which either of the two approaches is useful in such prognostic models.
The first approach is Bayesian model averaging (BMA) adapted for the proportional hazard model, termed ‘approx. BMA’ here. As a new approach, we propose a method which averages over a set of possible models using weights estimated from bootstrap resampling as proposed by S. T. Buckland et al. [Biometrics 53, No. 2, 603–618 (1997; Zbl 0885.62118)], but in addition we perform an initial screening of variables based on the inclusion frequency of each variable to reduce the set of variables and corresponding models. For some necessary parameters of the procedure, investigations concerning sensible choices are still required. The main objective of prognostic models is prediction, but the interpretation of single effects is also important and models should be general enough to ensure transportability to other clinical centres.
In the data examples, we compare predictions of our new approach with approx. BMA, with ‘conventional’ predictions from one selected model and with predictions from the full model. Confidence intervals are compared in one example. Comparisons are based on the partial predictive score and the Brier score. We conclude that the two model averaging methods yield similar results and are especially useful when there is a high number of potential prognostic factors, most likely some of them without influence in a multivariable context. Although the method based on bootstrap resampling lacks formal justification and requires some ad hoc decisions, it has the additional positive effect of achieving model parsimony by reducing the number of explanatory variables and dealing with correlated variables in an automatic fashion.

MSC:
62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
62N99 Survival analysis and censored data
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Altman D, Statistics in Medicine 19 pp 453– (2000) · doi:10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5
[2] Altman DG, Statistics in Medicine 8 pp 771– (1989) · doi:10.1002/sim.4780080702
[3] Blettner M, Statistics in Medicine 12 pp 1325– (1993) · doi:10.1002/sim.4780121405
[4] Breiman L, Journal of the American Statistical Association 87 pp 738– (1992) · doi:10.1080/01621459.1992.10475276
[5] Breiman L, Machine Learning 24 pp 123– (1996)
[6] Breiman L, The Annals of Statistics 24 pp 2350– (1996) · Zbl 0867.62055 · doi:10.1214/aos/1032181158
[7] Brier G, Monthly Weather Review 78 pp 1– (1950) · doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
[8] Brooks SP, Statistical Science 15 pp 357– (2000) · doi:10.1214/ss/1009213003
[9] Buckland S, Biometrics 53 pp 603– (1997) · Zbl 0885.62118 · doi:10.2307/2533961
[10] Bunnin F, Statistics and Computing 12 pp 37– (2002) · Zbl 1247.91181 · doi:10.1023/A:1013116204872
[11] Burnham K, Journal of Wildlife Research 28 pp 111– (2001) · doi:10.1071/WR99107
[12] Burnham KP, Model selection and inference. A practical information theoretic approach (1998) · doi:10.1007/978-1-4757-2917-7
[13] Burton A, British Journal of Cancer 91 pp 4– (2004) · doi:10.1038/sj.bjc.6601907
[14] Candolo C, The Statistician 52 pp 165– (2003)
[15] Chatfield C, Journal of the Royal Statistical Society: Series A 158 pp 419– (1995) · Zbl 04527846 · doi:10.2307/2983440
[16] Chatfield C, The Statistician 51 (1) pp 1– (2002)
[17] Chen C, Statistics in Medicine 4 pp 39– (1985) · doi:10.1002/sim.4780040107
[18] Cox DR, Journal of the Royal Statistical Society: Series B 34 pp 187– (1972)
[19] Davison AC, Bootstrap methods and their application (1997) · doi:10.1017/CBO9780511802843
[20] Dijkstra TK, Lecture notes in economics and mathematical systems. On model selection uncertainty and its statistical implications (1994)
[21] Proceedings of a Workshop, held in Groningen, The Netherlands, September 25-26, Heidelberg, Springer-Verlag: 17-38 .
[22] Draper D, Journal of the Royal Statistical Society: Series B 57 pp 45– (1995)
[23] Efron B, American Statistician 37 pp 36– (1983)
[24] Faraway J, J Comput Graph Statist 1 pp 213– (1992)
[25] Fleming TR, Counting processes and survival analysis (1991) · Zbl 0727.62096
[26] Furnival GM, Technometrics 16 pp 499– (1974) · doi:10.1080/00401706.1974.10489231
[27] Graf E, Statistics in Medicine 18 pp 2529– (1999) · doi:10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5
[28] Hjorth JSU, Computer intensive statistical methods. validation, model selection and bootstrap (1994) · Zbl 0829.62001
[29] Hoeting JA, Statistical Science 14 pp 382– (1999) · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[30] Holländer N, Statistics in Medicine 23 pp 1701– (2004) · doi:10.1002/sim.1611
[31] Kadane J, Journal of the American Statistical Association 99 pp 279– (2004) · Zbl 1089.62501 · doi:10.1198/016214504000000269
[32] Kass R, Journal of the American Statistical Association 90 (430) pp 773– (1995) · doi:10.1080/01621459.1995.10476572
[33] Kuk AYC, Biometrika 71 pp 587– (1984) · doi:10.1093/biomet/71.3.587
[34] Lawless J, Biometrics 34 pp 318– (1978) · doi:10.2307/2530022
[35] MacKenzie DI, Ecology 83 pp 2387– (2002) · doi:10.1890/0012-9658(2002)083[2387:HSDPBI]2.0.CO;2
[36] Madigan D, Journal of the American Statistical Association 89 pp 1535– (1994) · doi:10.1080/01621459.1994.10476894
[37] Miller AJ, Subset selection un regression (1990) · doi:10.1007/978-1-4899-2939-6
[38] Peduzzi P, Journal of Clinical Epidemiology 48 pp 1503– (1995) · doi:10.1016/0895-4356(95)00048-8
[39] Raftery A, Bayesian statistics 5 - Proceedings of the 5th Valencia International Meeting
[40] Raftery AE, Journal of the American Statistical Association 92 pp 179– (1997) · doi:10.1080/01621459.1997.10473615
[41] Sauerbrei W, Journal of the Royal Statistical Society: Series C 48 pp 313– (1999) · Zbl 0939.62114 · doi:10.1111/1467-9876.00155
[42] Sauerbrei W, Statistics in Medicine 11 pp 2093– (1992) · doi:10.1002/sim.4780111607
[43] Schwarz G, Annals of Statistics 6 pp 461– (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[44] Stanley TR, Biometrical Journal 40 pp 475– (1998) · Zbl 1008.62691 · doi:10.1002/(SICI)1521-4036(199808)40:4<475::AID-BIMJ475>3.0.CO;2-#
[45] Steyerberg EW, Statistica Neerlandica 55 pp 76– (2001) · Zbl 1075.62651 · doi:10.1111/1467-9574.00157
[46] Ulm K, Biometrie und Informatik in Medizin und Biologie 20 pp 171– (1989)
[47] Van Houwelingen JC, Statistics in Medicine 9 pp 1303– (1990) · doi:10.1002/sim.4780091109
[48] Verweij PJM, Statistics in Medicine 12 pp 2305– (1993) · doi:10.1002/sim.4780122407
[49] Viallefont V, Statistics in Medicine 20 pp 3215– (2001) · doi:10.1002/sim.976
[50] Volinsky CT, Journal of the Royal Statistical Society: Series C 46 pp 433– (1997) · Zbl 0903.62093 · doi:10.1111/1467-9876.00082
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.