# zbMATH — the first resource for mathematics

Haplotype-based regression analysis and inference of case-control studies with unphased genotypes and measurement errors in environmental exposures. (English) Zbl 1274.62829
Summary: It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. N. Chatterjee and R. J. Carroll [Biometrika 92, No. 2, 399– 418 (2005; Zbl 1094.62136)] developed an efficient retrospective maximum-likelihood method for analysis of case-control studies that exploits an assumption of gene-environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. C. Spinka, R. J. Carroll and N. Chatterjee [“Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity”, Gen. Epidemiology 29, 108–127 (2005)] extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. We further extend this approach to situations when some of the environmental exposures are measured with error. Using a polychotomous logistic regression model, we allow disease status to have $$K+1$$ levels. We propose use of a pseudolikelihood and a related EM algorithm for parameter estimation. We prove consistency and derive the resulting asymptotic covariance matrix of parameter estimates when the variance of the measurement error is known and when it is estimated using replications. Inferences with measurement error corrections are complicated by the fact that the Wald test often behaves poorly in the presence of large amounts of measurement error. The likelihood-ratio (LR) techniques are known to be a good alternative. However, the LR tests are not technically correct in this setting because the likelihood function is based on an incorrect model, i.e., a prospective model in a retrospective sampling scheme. We corrected standard asymptotic results to account for the fact that the LR test is based on a likelihood-type function. The performance of the proposed method is illustrated using simulation studies emphasizing the case when genetic information is in the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is illustrated using a population-based case-control study of the association between calcium intake and the risk of colorectal adenoma.

##### MSC:
 62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text:
##### References:
 [1] Andersen, Asymptotic properties of conditional maximum likelihood estimators, Journal of the Royal Statistical Society, Series B 32 pp 283– (1970) · Zbl 0204.51902 [2] Carroll, Measurement Error in Nonlinear Models (2006) · Zbl 1119.62063 · doi:10.1201/9781420010138 [3] Chatterjee, Semiparametric maximum likelihood estimation in case-control studies of gene-environmental interactions, Biometrika 92 pp 399– (2005) · Zbl 1094.62136 · doi:10.1093/biomet/92.2.399 [4] Chatterjee, Comment on the paper Likelihood based inference on haplotype effects in genetic association studies by D. J. Lin and D. Zhang, Journal of the American Statistical Association 102 pp 108– (2006) · doi:10.1198/016214505000000835 [5] Cook, A simulation extrapolation method for parametric measurement error models, Journal of the American Statistical Association 89 pp 1314– (1995) · Zbl 0810.62028 · doi:10.2307/2290994 [6] Cornfield, Proc. 3rd Berkeley Symp. Math. Statist. Prob. 4 pp 135– (1956) [7] Epstein, Inference of haplotype effects in case-control studies using unphased genotype data, American Journal of Human Genetics 73 pp 1316– (2003) · doi:10.1086/380204 [8] Foutz, The performance of the likelihood ratio test when the model is incorrect, Annals of Statistics 5 (6) pp 1183– (1977) · Zbl 0391.62004 · doi:10.1214/aos/1176344003 [9] Fuller, Measurement Error Models (1987) · doi:10.1002/9780470316665 [10] Kent, Robust properties of likelihood ratio test, Biometrika 69 pp 19– (1982) · Zbl 0485.62031 [11] Lin, Likelihood-based inference on haplotype effects in genetic association studies (with discussion), Journal of the American Statistical Association 101 pp 89– (2006) · Zbl 1118.62371 · doi:10.1198/016214505000000808 [12] Peters, Association of genetic variants in the calcium-sensing receptor with risk of colorectal adenoma, Cancer Epidemiol Biomarkers Prev 13 (12) pp 2181– (2004) [13] Potischman, Increased risk of early stage breast cancer related to consumption of sweet foods among women less than age 45, Cancer Causes and Control 13 pp 937– (2002) · doi:10.1023/A:1021919416101 [14] Prentice, Logistic disease incidence models and case-control studies, Biometrika 66 pp 403– (1979) · Zbl 0428.62078 · doi:10.1093/biomet/66.3.403 [15] Roy, A note on asymptotic distribution of likelihood ratio, Calcutta Statistical Association Bulletin 1 pp 60– (1957) [16] Satten, Comparison of prospective and retrospective methods for haplotype inference in case-control studies, Genetic Epidemiology 27 pp 192– (2004) · doi:10.1002/gepi.20020 [17] Schafer, Likelihood analysis for errors-in-variables regression with replicate measurements, Biometrika 83 pp 813– (1996) · Zbl 0882.62063 · doi:10.1093/biomet/83.4.813 [18] Spinka, Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity, Genetic Epidemiology 29 pp 108– (2005) · doi:10.1002/gepi.20085 [19] Stefanski, Conditional scores and optimal scores in generalized linear measurement error models, Biometrika 74 pp 703– (1987) · Zbl 0632.62052 [20] Stefanski, Estimating a nonlinear function of a normal mean, Biometrika 92 pp 732– (2005) · Zbl 1152.62318 · doi:10.1093/biomet/92.3.732 [21] Subar, Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: The Observing Protein and Energy Nutrition (OPEN) study, American Journal of Epidemiology 54 pp 426– (2003) [22] Wilks, The large-sample distribution of the likelihood ratio for testing composite hypothesis, Annals of Mathematical Statistics 7 pp 73– (1938) · Zbl 0018.32003 [23] Zhang, Linear mixed models with flexible distributions of random effects for longitudinal data, Biometrics 57 pp 795– (2001) · Zbl 1209.62087 · doi:10.1111/j.0006-341X.2001.00795.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.