Generalized linear models with a coarsened covariate.

*(English)*Zbl 1111.62314Summary: We consider generalized linear models with a coarsened covariate. The term ’coarsened’ is used here to refer to the case where the exact value of the covariate of interest is not fully observed. Instead, only some set or grouping that contains the exact value is observed. In particular, we propose a likelihood-based method for estimating regression parameters in a generalized linear model relating the mean of the outcome to covariates. We outline Newton - Raphson and EM algorithms for obtaining maximum likelihood estimates of the regression parameters. We also compare and contrast this likelihood-based approach with two somewhat ad hoc procedures: a complete-case analysis in which individuals with coarsened data are excluded and estimation is based on the remaining ’complete cases’, and a coarsened data regression model in which the covariate values for all the complete cases are coarsened and then included in a regression model relating the mean to the coarsened covariate. The methodology that is presented is motivated by coarsened data on the racial - ethnicity categorization of patients in the US’s National Ambulatory Medical Care Survey, a study to examine the medical care that is provided to a patient in a physician’s office. In this study, the outcome of interest is the level of tests (none, non-invasive tests or invasive tests) ordered for the patient at the doctor’s visit. One of the covariates of interest is the patient’s four-level discrete covariate comprised of four racial - ethnicity categories: white - Hispanic, white - non-Hispanic, African-American - Hispanic and African-American - non-Hispanic. However, of the 19 095 patients in the sample, 14 955 (or 78%) have the exact category of race - ethnicity recorded and 4140 (or 22%) have race - ethnicity coarsened. For the latter group of 4140 individuals, the ethnicity is not recorded, but we know that 3683 are white and 457 are African-American.

##### MSC:

62J12 | Generalized linear models (logistic models) |

62F10 | Point estimation |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

PDF
BibTeX
XML
Cite

\textit{S. Lipsitz} et al., J. R. Stat. Soc., Ser. C, Appl. Stat. 53, No. 2, 279--292 (2004; Zbl 1111.62314)

Full Text:
DOI

##### References:

[1] | Dempster A. P., J. R. Statist. Soc. 39 pp 1– (1977) |

[2] | Fletcher R., Practical Methods of Optimization (1987) · Zbl 0905.65002 |

[3] | Heitjan D. F., Am. Statistn 50 pp 207– (1996) |

[4] | Hocking R. R., Biometrics 30 pp 469– (1974) |

[5] | Ibrahim J. G., J. Am. Statist. Ass. 85 pp 765– (1990) |

[6] | Lipsitz S. R., Biometrika 83 pp 916– (1996) |

[7] | Louis T. A., J. R. Statist. Soc. 44 pp 226– (1982) |

[8] | McCullagh P., J. R. Statist. Soc. 42 pp 109– (1980) |

[9] | Rubin D. B., Biometrika 63 pp 581– (1976) |

[10] | SAS Institute, SAS/STAT User’s Guide, Version 8 (2000) |

[11] | Tenney J. B., National Ambulatory Medical Care Survey: Background and Methodology (1974) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.