Semiparametric regression for clustered data using generalized estimating equations.

*(English)*Zbl 1072.62566Summary: We consider estimation in a semiparametric generalized linear model for clustered data using estimating equations. Our results apply to the case where the number of observations per cluster is finite, whereas the number of clusters is large. The mean of the outcome variable \(\mu\) is of the form \(g(\mu) = \mathbf X^T \mathbf\beta+ \theta(T)\), where \(g(\cdot)\) is a link function, \(\mathbf X\) and \(T\) are covariates, \(\mathbf\beta\) is an unknown parameter vector, and \(\theta(t)\) is an unknown smooth function. Kernel estimating equations proposed previously in the literature are used to estimate the infinite-dimensional nonparametric function \(\theta(t)\), and a profile-based estimating equation is used to estimate the finite-dimensional parameter vector \(\mathbf \beta\). We show that for clustered data, this conventional profile-kernel method often fails to yield a \(\sqrt n\)-consistent estimator of along with appropriate inference unless working independence is assumed or \(\theta(t)\) is artificially undersmoothed, in which case asymptotic inference is possible. To gain insight into these results, we derive the semiparametric efficient score of \(\mathbf\beta\), which is found to have a complicated form, and show that, unlike for independent data, the profile-kernel method does not yield a score function asymptotically equivalent to the semiparametric efficient score o\(\mathbf\beta\), even when the true correlation is assumed and \(\theta(t)\) is undersmoothed. We illustrate the methods with an application to infectious disease data and evaluate their finite-sample performance through a simulation study.