×

Estimation of conditional prevalence from group testing data with missing covariates. (English) Zbl 1437.62437

Summary: We consider estimating the conditional prevalence of a disease from data pooled according to the group testing mechanism. Consistent estimators have been proposed in the literature, but they rely on the data being available for all individuals. In infectious disease studies where group testing is frequently applied, the covariate is often missing for some individuals. There, unless the missing mechanism occurs completely at random, applying the existing techniques to the complete cases without adjusting for missingness does not generally provide consistent estimators, and finding appropriate modifications is challenging. We develop a consistent spline estimator, derive its theoretical properties, and show how to adapt local polynomial and likelihood estimators to the missing data problem. We illustrate the numerical performance of our methods on simulated and real examples.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62D10 Missing data

Software:

ConfBands; SemiPar
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Chen, H. Y., “Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression,”, Journal of the American Statistical Association, 99, 1176-1189 (2004) · Zbl 1112.62324 · doi:10.1198/016214504000001727
[2] Chen, P.; Tebbs, J.; Bilder, C. R., “Group Testing Regression Models With Fixed and Random Effects,”, Biometrics, 65, 1270-1278 (2009) · Zbl 1180.62160 · doi:10.1111/j.1541-0420.2008.01183.x
[3] Claeskens, G.; Krivobokova, T.; Opsomer, J. D., “Asymptotic Properties of Penalized Spline Estimators,”, Biometrika, 96, 529-544 (2009) · Zbl 1170.62031 · doi:10.1093/biomet/asp035
[4] De Boor, C., A Practical Guide to Splines (2001), Berlin: Springer-Verlag, Berlin · Zbl 0987.65015
[5] Delaigle, A.; Hall, P., “Nonparametric Regression With Homogeneous Group Testing Data,”, The Annals of Statistics, 40, 131-158 (2012) · Zbl 1246.62101 · doi:10.1214/11-AOS952
[6] Delaigle, A.; Hall, P., “Nonparametric Methods for Group Testing Data, Taking Dilution Into Account,”, Biometrika, 102, 871-887 (2015) · Zbl 1419.62336
[7] Delaigle, A.; Hall, P.; Jamshidi, F., “Confidence Bands in Nonparametric Errors-In-Variables Regression,”, Journal of the Royal Statistical Society, Series B, 77, 149-169 (2015) · Zbl 1414.62129 · doi:10.1111/rssb.12067
[8] Delaigle, A.; Hall, P.; Wishart, J., “New Approaches to Nonparametric and Semiparametric Regression for Univariate and Multivariate Group Testing Data,”, Biometrika, 101, 567-585 (2014) · Zbl 1334.62058 · doi:10.1093/biomet/asu025
[9] Delaigle, A.; Meister, A., “Nonparametric Regression Analysis for Group Testing Data,”, Journal of the American Statistical Association, 106, 640-650 (2011) · Zbl 1232.62062 · doi:10.1198/jasa.2011.tm10520
[10] Delaigle, A.; Zhou, W.-X, “Nonparametric and Parametric Estimators of Prevalence From Group Testing Data With Aggregated Covariates,”, Journal of the American Statistical Association, 110, 1785-1796 (2015) · Zbl 1373.62541 · doi:10.1080/01621459.2015.1054491
[11] Dempster, A.; Rubin, D., Incomplete Data in Sample Surveys (Volume 2): Theory and Bibliography, “Introduction,”, 3-10 (1983), New York: Academic Press, New York · Zbl 0561.62008
[12] Dorfman, R., “The Detection of Defective Members of Large Populations,”, Annals of Mathematical Statistics, 14, 436-440 (1943) · doi:10.1214/aoms/1177731363
[13] Efromovich, S., “Nonparametric Regression With Predictors Missing at Random,”, Journal of the American Statistical Association, 106, 306-319 (2011) · Zbl 1396.62078 · doi:10.1198/jasa.2011.tm09506
[14] Fahey, J. W.; Ourisson, P. J.; Degnan, F. H., “Pathogen Detection, Testing, and Control in Fresh Broccoli Sprouts,”, Nutrition Journal, 5, 1-6 (2006)
[15] Fan, J.; Gijbels, I., Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability (1996), London: Chapman & Hall, London · Zbl 0873.62037
[16] Fletcher, J. D.; Russell, A. C.; Butler, R. C., “Seed-Borne Cucumber Mosaic Virus in New Zealand Lentil Crops: Yield Effects and Disease Incidence,”, New Zealand Journal of Crop and Horticultural Science, 27, 197-204 (1999) · doi:10.1080/01140671.1999.9514097
[17] Galambos, J.; Simonelli, I., Bonferroni-Type Inequalities With Applications (1996), New York: Springer-Verlag, New York · Zbl 0921.60017
[18] Holland, A. D., “Penalized Spline Estimation in the Partially Linear Model,”, Journal of Multivariate Analysis, 153, 211-235 (2017) · Zbl 1351.62092 · doi:10.1016/j.jmva.2016.10.001
[19] Huang, X.; Tebbs, J. M., “On Latent-Variable Model Misspecification in Structural Measurement Error Models for Binary Response,”, Biometrics, 65, 710-718 (2009) · Zbl 1172.62048 · doi:10.1111/j.1541-0420.2008.01128.x
[20] Jiang, D.; Zhao, P.; Tang, N., “A Propensity Score Adjustment Method for Regression Models With Nonignorable Missing Covariates,”, Computational Statistics & Data Analysis, 94, 98-119 (2016) · Zbl 1468.62091 · doi:10.1016/j.csda.2015.07.017
[21] Kim, J. K.; Yu, C. L., “A Semiparametric Estimation of Mean Functionals With Nonignorable Missing Data,”, Journal of the American Statistical Association, 106, 157-165 (2012) · Zbl 1396.62032 · doi:10.1198/jasa.2011.tm10104
[22] Krivobokova, T.; Kneib, T.; Claeskens, G., “Simultaneous Confidence Bands for Penalized Spline Estimators,”, Journal of the American Statistical Association, 105, 852-863 (2010) · Zbl 1392.62094 · doi:10.1198/jasa.2010.tm09165
[23] Lennon, J. T., “Diversity and Metabolism of Marine Bacteria Cultivated on Dissolved DNA,”, Applied and Environmental Microbiology, 73, 2799-2805 (2007) · doi:10.1128/AEM.02674-06
[24] Lewis, J. L.; Lockary, V. M.; Kobic, S., “Cost Savings and Increased Efficiency Using a Stratified Specimen Pooling Strategy for Chlamydia trachomatis and Neisseria gonorrhoeae,”, Sexually Transmitted Diseases, 39, 46-48 (2012) · doi:10.1097/OLQ.0b013e318231cd4a
[25] Li, M.; Xie, M., “Nonparametric and Semiparametric Regression Analysis of Group Testing Samples,”, International Journal of Statistics in Medical Research, 1, 60-72 (2012)
[26] Liang, H.; Wang, S.; Carroll, R. J., “Partially Linear Models With Missing Response Variables and Error-Prone Covariates,”, Biometrika, 94, 185-198 (2007) · Zbl 1223.62046 · doi:10.1093/biomet/asm010
[27] Liang, H.; Wang, S.; Robins, J. M.; Carroll, R. J., “Estimation in Partially Linear Models With Missing Covariates,”, Journal of the American Statistical Association, 99, 357-367 (2011) · Zbl 1117.62385 · doi:10.1198/016214504000000421
[28] Lindan, C.; Mathur, M.; Kumta, S.; Jerajani, H.; Gogate, A.; Schachter, J.; Moncada, J., “Utility of Pooled Urine Specimens for Detection of Chlamydia trachomatis and Neisseria gonorrhoeae in Men Attending Public Sexually Transmitted Infection Clinics in Mumbai, India, by PCR,”, Journal of Clinical Microbiology, 43, 1674-1677 (2005) · doi:10.1128/JCM.43.4.1674-1677.2005
[29] Little, R. J., “Survey Nonresponse Adjustments for Estimates of Means,”, International Statistical Review, 54, 139-157 (1986) · Zbl 0596.62009 · doi:10.2307/1403140
[30] Little, R. J.; Rubin, D. B., Statistical Analysis with Missing Data (2014), New York: Wiley, New York
[31] Molenberghs, G.; Fitzmaurice, G.; Kenward, M.; Tsiatis, A.; Verbeke, G., Handbook of Missing Data Methodology (Handbooks of Modern Statistical Methods (2014), Boca Raton, FL: Chapman & Hall/CRC, Boca Raton, FL
[32] Montesinos-López, O. A.; Montesinos-López, A.; Crossa, J.; Eskridge, K., “Sample Size Under Inverse Negative Binomial Group Testing for Accuracy in Parameter Estimation,”, PLoS ONE, 7, 1-11 (2012)
[33] Montesinos-López, O. A.; Montesinos-López, A.; Crossa, J.; Eskridge, K., “Sample Size for Detecting Transgenic Plants Using Inverse Binomial Group Testing With Dilution Effect,”, Seed Science Research, 23, 279-288 (2013)
[34] Nagi, M. S.; Raggi, L. G., “Importance to ‘airsac’ Disease of Water Supplies Contaminated With Pathogenic Escherichia coli,”, Avian Diseases, 16, 718-723 (1972) · doi:10.2307/1588749
[35] Oh, H. L.; Scheuren, F. J., Incomplete Data in Sample Surveys (Volume 2): Theory and Bibliography, “Weighting Adjustment for Unit Nonresponse,”, 143-184 (1983), New York: Academic Press, New York
[36] Robins, J. M.; Rotnitzky, A.; Zhao, L. P., “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed,”, Journal of the American Statistical Association, 89, 846-866 (1994) · Zbl 0815.62043 · doi:10.1080/01621459.1994.10476818
[37] Rubin, D. B., “Inference and Missing Data,”, Biometrika, 63, 581-592 (1976) · Zbl 0344.62034 · doi:10.1093/biomet/63.3.581
[38] Rubin, D. B.; Stern, H. S.; Vehovar, V., “Handling ‘Don’t Know’ Survey Responses: The Case of the Slovenian Plebiscite,”, Journal of the American Statistical Association, 90, 822-828 (1995) · doi:10.2307/2291315
[39] Ruppert, D.; Wand, M. P.; Carroll, R. J., Semiparametric Regression (2003), Cambridge: Cambridge University Press, Cambridge · Zbl 1038.62042
[40] Sarov, B.; Novack, L.; Beer, N.; Safi, J.; Soliman, H.; Pliskin, J.; Litvak, E.; Yaari, A.; Shinar, E., “Feasibility and Cost-Benefit of Implementing Pooled Screening for HCVAg in Small Blood Bank Settings,”, Transfusion Medicine, 17, 479-487 (2007)
[41] Schumaker, L. L., Spline Functions: Basic Theory (1981), Cambridge: Cambridge University Press, Cambridge · Zbl 0449.41004
[42] Sheather, S. J.; Jones, M. C., “A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation,”, Journal of the Royal Statistical Society, Series B, 53, 683-690 (1991) · Zbl 0800.62219 · doi:10.1111/j.2517-6161.1991.tb01857.x
[43] Stone, C. J., “Optimal Rates of Convergence for Nonparametric Estimators,”, The Annals of Statistics, 8, 1348-1360 (1980) · Zbl 0451.62033 · doi:10.1214/aos/1176345206
[44] Vansteelandt, S.; Goetghebeur, E.; Verstraeten, T., “Regression Models for Disease Prevalence With Diagnostic Tests on Pools of Serum Samples,”, Biometrics, 56, 1126-1133 (2000) · Zbl 1060.62674 · doi:10.1111/j.0006-341X.2000.01126.x
[45] Verstraeten, T.; Farah, B.; Duchateau, L.; Matu, R., Pooling Sera to Reduce the Cost of HIV Surveillance: A Feasibility Study in a Rural Kenyan District, Tropical Medicine & International Health, 3, 747-750 (1998)
[46] Wahed, M.; Chowdhury, D.; Nermell, B.; Khan, S. I.; Ilias, M.; Rahman, M.; Persson, L.; Vahter, M., “A Modified Routine Analysis of Arsenic Content in Drinking-Water in Bangladesh by Hydride Generation-Atomic Absorption Spectrophotometry,”, Journal of Health, Population and Nutrition, 24, 36-41 (2006)
[47] Wang, C.; Wang, S.; Gutierrez, R. G.; Carroll, R. J., “Local Linear Regression for Generalized Linear Models With Missing Data,”, The Annals of Statistics, 26, 1028-1050 (1998) · Zbl 1073.62548 · doi:10.1214/aos/1024691087
[48] Wang, D.; McMahan, C. S.; Gallagher, C. M.; Kulasekera, K., “Semiparametric Group Testing Regression Models,”, Biometrika, 101, 587-598 (2014) · Zbl 1334.62070 · doi:10.1093/biomet/asu007
[49] Wang, D.; Zhou, H.; Kulasekera, K., “A Semi-Local Likelihood Regression Estimator of the Proportion Based on Group Testing Data,”, Journal of Nonparametric Statistics, 25, 209-221 (2013) · Zbl 1297.62077 · doi:10.1080/10485252.2012.750726
[50] Wei, Y.; Ma, Y.; Carroll, R. J., “Multiple Imputation in Quantile Regression,”, Biometrika, 99, 423-438 (2012) · Zbl 1239.62085 · doi:10.1093/biomet/ass007
[51] Xie, M., “Regression Analysis of Group Testing Samples,”, Statistics in Medicine, 20, 1957-1969 (2001) · doi:10.1002/sim.817
[52] Zhang, B.; Bilder, C. R.; Tebbs, J. M., “Regression Analysis for Multiple-Disease Group Testing Data,”, Statistics in Medicine,, 32, 4954-4966 (2013) · doi:10.1002/sim.5858
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.