On quantile quantile plots for generalized linear models.

*(English)*Zbl 1252.62072Summary: The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem, M.G. Ben and V.J. Yohai [Quantile-quantile plot for deviance residuals in the generalized linear model. J. Comput. Graph. Stat. 13, No. 1, 36–47 (2004)] proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed by Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package robust. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from the ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands.

In simulations the quantiles produced by the new methods give results indistinguishable from the original Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50,000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large data set cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residuals, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.

In simulations the quantiles produced by the new methods give results indistinguishable from the original Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50,000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large data set cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residuals, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.

##### MSC:

62J12 | Generalized linear models (logistic models) |

62-09 | Graphical methods in statistics (MSC2010) |

62G08 | Nonparametric regression and quantile regression |

65C60 | Computational problems in statistics (MSC2010) |

PDF
BibTeX
XML
Cite

\textit{N. H. Augustin} et al., Comput. Stat. Data Anal. 56, No. 8, 2404--2409 (2012; Zbl 1252.62072)

Full Text:
DOI

##### References:

[1] | Ben, M.G.; Yohai, V.J., Quantile – quantile plot for deviance residuals in the generalized linear model, Journal of computational and graphical statistics, 13, 1, 36-47, (2004) |

[2] | Breslow, N.E., Day, N.E., 1987. Statistical Methods in Cancer Research. Vol. II, The Design and Analysis of Cohort Studies (IARC Scientific Publication No. 82). Lyon: International Agency for Research on Cancer. |

[3] | Chen, X.; Fu, Y-Z., Model selection for zero-inflated regression with missing covariates, Computational statistics & data analysis, 55, 1, 765-773, (2011) · Zbl 1247.62196 |

[4] | Cox, D.R.; Snell, E.J., Analysis of binary data, (1989), London: Chapman & Hall · Zbl 0729.62004 |

[5] | Garay, A.M.; Hashimoto, E.M.; Ortega, E.M.M.; Lachos, V.L., On estimation and influence diagnostics for zero-inflated negative binomial regression models, Computational statistics & data analysis, 55, 3, 1304-1318, (2011) · Zbl 1328.65029 |

[6] | Lawson, A.B., Bayesian disease mapping. hierarchical modeling in spatial epidemiology, (2001), CRC/Chapman & Hall Boca Raton |

[7] | Wang, J., Zamar, R., Marazzi, A., Yohai, V., Salibian-Barrera, M., Maronna, R., Zivot, E., Rocke, D., Martin, D., Maechler, M., Konis, K., 2010. Robust: Insightful Robust Library. R package version 0.3-11. URL: http://CRAN.R-project.org/package=robust. |

[8] | Wood, S.N., Generalized additive models: an introduction with R, (2006), CRC/Chapman & Hall Boca Raton · Zbl 1087.62082 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.