Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model. (English) Zbl 1059.62007

Summary: The authors consider semiparametric efficient estimation of parameters in the conditional mean model for a simple incomplete data structure in which the outcome of interest is observed only for a random subset of subjects but covariates and surrogate (auxiliary) outcomes are observed for all. They use optimal estimating function theory to derive the semiparametric efficient score in closed form. They show that when covariates and auxiliary outcomes are discrete, a Horvitz-Thompson type estimator with empirically estimated weights is semiparametric efficient. The authors give simulation studies validating the finite-sample behaviour of the semiparametric efficient estimator and its asymptotic variance; they demonstrate the efficiency of the estimator in realistic settings.


62D05 Sampling theory, sample surveys
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
62J12 Generalized linear models (logistic models)
62H12 Estimation in multivariate analysis
Full Text: DOI Link


[1] Alonzo, Estimating disease prevalence in two-phase studies, Biostatistics (Oxford) 4 pp 313– (2003) · Zbl 1141.62346
[2] Bickel, Efficient and Adaptive Estimation for Semiparametric Models. (1993) · Zbl 0786.62001
[3] Breslow, Logistic regression for two-stage case-control data, Biometrika 5 pp 11– (1988) · Zbl 0635.62110
[4] Chamberlain, Asymptotic efficiency in estimation with conditional moment restrictions, Journal of Economics 34 pp 305– (1987) · Zbl 0618.62040
[5] J. Chen (2002). Semiparametric Efficient and Inefficient Estimation for the Auxiliary Outcome Problem with the Conditional Mean Model. Doctoral dissertation, Department of Biostatistics, University of Washington, Seattle.
[6] Chen, A robust imputation method for surrogate outcome data, Biometrika 87 pp 711– (2000) · Zbl 0956.62049
[7] Clayton, Analysis of longitudinal binary data from multiphase sampling, Journal of the Royal Statistical Society Series B 60 pp 71– (1998)
[8] Costelo, The Great Smoky Mountains Study of Youth: prevalence and correlates of DSM-HJ-R disorders, Archives of General Psychiatry 53 pp 1129– (1996)
[9] R. F.Engle & D. L.McFadden, eds. (1994). Handbook of Econometrics, Volume 4. Elsevier, Amsterdam. · Zbl 0982.62503
[10] Foutz, On the unique consistent solution to the likelihood equations, Journal of the American Statistical Association 72 pp 147– (1977) · Zbl 0354.62029
[11] Godambe, An optimum property of regular maximum likelihood estimation, The Annals of Mathematical Statistics 31 pp 1208– (1960)
[12] Godambe, Quasi-likelihood and optimal estimation, International Statistical Review 55 pp 231– (1987) · Zbl 0671.62007
[13] Heyde, Fixed sample and asymptotic optimality for classes of estimating functions, Contemporary Mathematics 80 pp 241– (1988) · Zbl 0684.62066
[14] Heyde, Quasi-likelihood and its Application: a General Approach to Optimal Parameter Estimation. (1997) · Zbl 0879.62076
[15] Holcroft, Efficient estimation of regression parameters from multistage studies with validation of outcome and covariates, Journal of Statistical Planning and Inference 65 pp 349– (1997) · Zbl 0946.62065
[16] Lawless, Semiparametric methods for response-selective and missing data problems in regression, Journal of the Royal Statistical Society Series B 61 pp 413– (1999) · Zbl 0915.62030
[17] Little, Statistical Analysis with Missing Data. (1987)
[18] MRC-CFAS, Cognitive function and dementia in six areas of England and Wales: The distribution of MMSE and prevalence of GMS organicity level in the MRC-CFA study, Psychological Medicine 28 pp 319– (1998)
[19] B. Nan (2001). Information Bounds and Efficient Estimates for Two-Phase Designs with Life-Time Data. Doctoral dissertation, Department of Biostatistics, University of Washington, Seattle.
[20] Pepe, Auxiliary outcome data and the mean score method, Journal of Statistical Planning and Inference 42 pp 137– (1994) · Zbl 0806.62090
[21] Pierce, The asymptotic effect of substituting estimators for parameters in certain types of statistics, The Annals of Statistics 10 pp 475– (1982) · Zbl 0488.62012
[22] Prentice, Surrogate endpoints in clinical trials: Definition and operational criteria, Statistics in Medicine 8 pp 431– (1989)
[23] Robins, Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates, Journal of the Royal Statistical Society Series B 57 pp 409– (1995) · Zbl 0813.62029
[24] Robins, Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association 89 pp 846– (1994) · Zbl 0815.62043
[25] Robins, Discussion on the papers by Forster and Smith and Clayton et al, Journal of the Royal Statistical Society Series B 60 pp 91– (1998)
[26] Rotnitzky, Semiparametric estimation of models for means and covariances in the presence of missing data, Scandinavian Journal of Statistics 22 pp 323– (1995) · Zbl 0835.62031
[27] Rotnitzky, Semiparametric regression estimation in the presence of dependent censoring, Biometrika 82 pp 805– (1995) · Zbl 0861.62030
[28] Schisterman, Estimation of the mean of a k-sample u-statistic with missing outcomes and auxiliaries, Biometrika 88 pp 713– (2001) · Zbl 0985.62083
[29] Tenenbein, A double sampling scheme for estimating from binomial data with misclassifications, Journal of the American Statistical Association 65 pp 1350– (1970)
[30] van der Vaart, Asymptotic Statistics. (1998) · Zbl 0910.62001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.