A multiphase design strategy for dealing with participation bias. (English) Zbl 1216.62167

Summary: A recently funded study of the impact of oral contraceptive use on the risk of bone fracture employed the randomized recruitment scheme of C. Weinberg and S. Wacholder [Biometrics 46, 963–975 (1990)]. One potential complication in the bone fracture study is the potential for differential response rates between cases and controls; participation rates in previous, related studies have been around 70%. Although data from randomized recruitment schemes may be analyzed within the two-phase study framework, ignoring potential differential participation may lead to biased estimates of association. To overcome this, we build on the two-phase framework and propose an extension by introducing an additional stage of data collection aimed specifically at addressing potential differential participation. Four estimators that correct for both sampling and participation bias are proposed; two are general purpose and two are for the special case where covariates underlying the participation mechanism are discrete. Because the fracture study is ongoing, we illustrate the methods using infant mortality data from North Carolina.


62P10 Applications of statistics to biology and medical sciences; meta analysis
92C50 Medical applications (general)
62N02 Estimation in survival analysis and censored data
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI Link


[1] Austin, The effect of response bias on the odds ratio, American Journal of Epidemiology 114 pp 137– (1981)
[2] Breslow, Logistic regression for two-stage case-control data, Biometrika 75 pp 11– (1988) · Zbl 0635.62110
[3] Breslow, Design and analysis of two-phase studies with binary outcomes applied to Wilms’ tumor prognosis, Applied Statistics 48 pp 457– (1999) · Zbl 0957.62091
[4] Breslow, Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling, Journal of the Royal Statistical Society, Series B, Methodological 59 pp 447– (1997) · Zbl 0886.62071
[5] Breslow, On the semiparametric efficiency of logistic regression under case-control sampling, Bernoulli 6 pp 447– (2000) · Zbl 0965.62033
[6] Chatterjee, A two-stage regression model for epidemiological studies with multivariate disease classification data, Journal of the American Statistical Association 99 pp 127– (2004) · Zbl 1089.62526
[7] Chatterjee, A pseudoscore estimator for regression problems with two-phase sampling, Journal of the American Statistical Association 98 pp 158– (2003) · Zbl 1047.62031
[8] Chen, Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model, The Canadian Journal of Statistics 32 pp 359– (2004) · Zbl 1059.62007
[9] Chen, Breast cancer relative hazard estimates from case-control and cohort designs with missing data on mammographic density, Journal of the American Statistical Association 103 pp 976– (2008) · Zbl 1205.62163
[10] Flanders, Analytic methods for two-stage case-control studies and other stratified designs, Statistics in Medicine 10 pp 739– (1991)
[11] Follmann, Multiple outputation: Inference for complex clustered data by averaging analyses from independent data, Biometrics 59 pp 420– (2003) · Zbl 1210.62158
[12] Foutz, On the unique consistent solution to likelihood equations, Journal of the American Statistical Association 72 pp 147– (1977) · Zbl 0354.62029
[13] Greenland, Causal diagrams for epidemiologic research, Epidemiology 10 pp 37– (1999)
[14] Hernán, A structural approach to selection bias, Epidemiology 15 pp 615– (2004)
[15] Lawless, Semiparametric methods for response-selective and missing data problems in regression, Journal of the Royal Statistical Society, Series B 21 pp 413– (1999) · Zbl 0915.62030
[16] Lin, Matched case-control data analysis with selection bias, Biometrics 57 pp 1245– (2001) · Zbl 1209.62306
[17] Little, Statistical Analysis of Missing Data (2002)
[18] Pearl, Causal diagrams for empirical research, Biometrika 82 pp 669– (1995) · Zbl 0860.62045
[19] Pfeiffer, On a supplemeted case-control design, Biometrics 61 pp 584– (2005)
[20] R Development Core Team, R: A Language and Environment for Statistical Computing (2009)
[21] Robins, Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association 89 pp 846– (1994) · Zbl 0815.62043
[22] Rothman, Modern Epidemiology (1998)
[23] Schill, Logistic analysis in case-control studies under validation sampling, Biometrika 84 pp 57– (1993) · Zbl 0783.62097
[24] Scott, Fitting regression models to case-control data by maximum likelihood, Biometrika 84 pp 57– (1997) · Zbl 1058.62505
[25] Weinberg, The design and analysis of case-control studies with biased sampling, Biometrics 46 pp 963– (1990)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.