×

Optimal designs of two-phase studies. (English) Zbl 1453.62608

Summary: The two-phase design is a cost-effective sampling strategy to evaluate the effects of covariates on an outcome when certain covariates are too expensive to be measured on all study subjects. Under such a design, the outcome and inexpensive covariates are measured on all subjects in the first phase and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase. Previous research on two-phase studies has focused largely on the inference procedures rather than the design aspects. We investigate the design efficiency of the two-phase study, as measured by the semiparametric efficiency bound for estimating the regression coefficients of expensive covariates. We consider general two-phase studies, where the outcome variable can be continuous, discrete, or censored, and the second-phase sampling can depend on the first-phase data in any manner. We develop optimal or approximately optimal two-phase designs, which can be substantially more efficient than the existing designs. We demonstrate the improvements of the new designs over the existing ones through extensive simulation studies and two large medical studies.

MSC:

62K05 Optimal statistical designs
62G08 Nonparametric regression and quantile regression
62J12 Generalized linear models (logistic models)
PDFBibTeX XMLCite
Full Text: DOI DOI

References:

[1] P. J, Bickel; A. J, Klaassen, C.; Ritov, Y.; J. A, Wellner, Efficient and Adaptive Estimation for Semiparametric Models (1998), New York: Springer-Verlag, New York · Zbl 0894.62005
[2] Borgan, Ø.; Langholz, B.; Samuelsen, S. O.; Goldstein, L.; Pogoda, J., Lifetime Data Analysis, 6, 39-58 (2000) · Zbl 0948.62069 · doi:10.1023/A:1009661900674
[3] Breslow, N. E.; Cain, K. C., Biometrika, 75, 11-20 (1988) · Zbl 0635.62110
[4] Breslow, N. E.; Chatterjee, N., Journal of the Royal Statistical Society, Series C, 48, 457-468 (1999) · Zbl 0957.62091
[5] Breslow, N. E.; Holubkov, R., Journal of the Royal Statistical Society, Series B, 59, 447-461 (1997) · Zbl 0886.62071 · doi:10.1111/1467-9868.00078
[6] Breslow, N. E.; McNeney, B.; Wellner, J. A., The Annals of Statistics, 31, 1110-1139 (2003) · Zbl 1105.62335
[7] Cai, J.; Zeng, D., Biometrics, 63, 1288-1295 (2007) · Zbl 1136.62070
[8] Chatterjee, N.; Chen, Y. H.; Breslow, N. E., Journal of the American Statistical Association, 98, 158-168 (2003) · Zbl 1047.62031 · doi:10.1198/016214503388619184
[9] Cox, D. R., Journal of the Royal Statistical Society, Series B, 34, 187-220 (1972) · Zbl 0243.62041 · doi:10.1111/j.2517-6161.1972.tb00899.x
[10] D’angio, G. J.; Breslow, N.; Beckwith, J. B.; Evans, A.; Baum, E.; Delorimier, A.; Fernbach, D.; Hrabovsky, E.; Jones, B.; Kelalis, P.; Othersen, H. B.; Tefft, M.; Thomas, P. R. M., Cancer, 64, 349-360 (1989) · doi:10.1002/1097-0142(19890715)64:2<349::AID-CNCR2820640202>3.0.CO;2-Q
[11] Derkach, A.; Lawless, J. F.; Sun, L., Biometrika, 102, 988-994 (2015) · Zbl 1390.62023
[12] Ding, J.; Lu, T. S.; Cai, J.; Zhou, H., Lifetime Data Analysis, 23, 57-82 (2017) · Zbl 1396.62253 · doi:10.1007/s10985-015-9355-7
[13] Ding, J.; Zhou, H.; Liu, Y.; Cai, J.; Longnecker, M. P., Biostatistics, 15, 636-650 (2014)
[14] Fedorov, V. V., and Leonov, S. L. (2013), Optimal Design for Nonlinear Response Models, Boca Raton, FL: CRC Press. · Zbl 1373.62001
[15] Green, D. M.; Breslow, N. E.; Beckwith, J. B.; Finklestein, J. Z.; Grundy, P. E.; Thomas, P. R.; Kim, T.; Shochat, S. J.; Haase, G. M.; Ritchey, M. L.; Kelalis, P. P.; D’Angio, G. J., Journal of Clinical Oncology, 16, 237-245 (1998)
[16] Green, D. M.; Grigoriev, Y. A.; Nan, B.; Takashima, J. R.; Norkool, P. A.; D’Angio, G. J.; Breslow, N. E., Journal of Clinical Oncology, 19, 1926-1934 (2001)
[17] Langholz, B.; Borgan, Ø., Biometrika, 82, 69-79 (1995) · Zbl 0823.62089
[18] Lawless, J. F., Lifetime Data Analysis, 24, 28-44 (2018) · Zbl 1468.62390
[19] Lawless, J. F.; Kalbfleisch, J. D.; Wild, C. J., Journal of the Royal Statistical Society, Series B, 61, 413-438 (1999) · Zbl 0915.62030 · doi:10.1111/1467-9868.00185
[20] Lehmann, E. L., and Romano, J. P. (2005), Testing Statistical Hypotheses, New York: Springer-Verlag. · Zbl 1076.62018
[21] Lin, D. Y.; Zeng, D.; Tang, Z. Z., Proceedings of the National Academy of Sciences, 110, 12247-12252 (2013)
[22] Prentice, R. L., Biometrika, 73, 1-11 (1986) · Zbl 0595.62111
[23] Prentice, R. L.; Pyke, R., Biometrika, 66, 403-411 (1979) · Zbl 0428.62078
[24] Robins, J. M.; Hsieh, F.; Newey, W., Journal of the Royal Statistical Society, Series B, 57, 409-424 (1995) · Zbl 0813.62029 · doi:10.1111/j.2517-6161.1995.tb02036.x
[25] Schildcrout, J. S.; Garbett, S. P.; Heagerty, P. J., Biometrics, 69, 405-416 (2013) · Zbl 1274.92027
[26] Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014), “Biological Insights From 108 Schizophrenia-Associated Genetic Loci,” Nature, 511, 421-427.
[27] Scott, A. J.; Wild, C. J., Biometrics, 47, 497-510 (1991) · Zbl 0736.62093
[28] ——— (1997), “Fitting Regression Models to Case-Control Data by Maximum Likelihood,” Biometrika, 84, 57-71. · Zbl 1058.62505
[29] Song, R.; Zhou, H.; Kosorok, M. R., Biometrika, 96, 221-228 (2009) · Zbl 1163.62088
[30] Tao, R.; Zeng, D.; Franceschini, N.; North, K. E.; Boerwinkle, E.; Lin, D. Y., Journal of the American Statistical Association, 110, 560-572 (2015) · Zbl 1373.62554 · doi:10.1080/01621459.2015.1008099
[31] Tao, R.; Zeng, D.; Lin, D. Y., Journal of the American Statistical Association, 112, 1468-1476 (2017)
[32] Thomas, D. C., Journal of the Royal Statistical Society, Series A, 140, 119-128 (1977)
[33] Warwick, A. B.; Kalapurakal, J. A.; Ou, S. S.; Green, D. M.; Norkool, P. A.; Peterson, S. M.; Breslow, N. E., International Journal of Radiation Oncology Biology Physics, 77, 210-216 (2010) · doi:10.1016/j.ijrobp.2009.04.057
[34] Weaver, M. A.; Zhou, H., Journal of the American Statistical Association, 100, 459-469 (2005) · Zbl 1117.62443
[35] White, J. E., American Journal of Epidemiology, 115, 119-128 (1982)
[36] Xue, A.; Wu, Y.; Zhu, Z.; Zhang, F.; Kemper, K. E.; Zheng, Z.; Yengo, L.; Lloyd-Jones, L. R.; Sidorenko, J.; Wu, Y.; eQTLGen Consortium; McRae, A. F.; Visscher, P. M.; Zeng, J.; Yang, J., Nature Communications, 9, 1 (2018) · doi:10.1038/s41467-018-04951-w
[37] Zeng, D.; Lin, D. Y., Journal of the American Statistical Association, 109, 371-383 (2014) · Zbl 1367.62099 · doi:10.1080/01621459.2013.842172
[38] Zhou, H.; Xu, W.; Zeng, D.; Cai, J., Journal of the Royal Statistical Society, Series B, 76, 197-215 (2014) · Zbl 1411.62090
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.