×

Constrained ordination analysis in the presence of zero inflation. (English) Zbl 07257888

Summary: Constrained ordination analysis, with canonical correspondence analysis (CCA) as its best known method, is a class of popular techniques for analyzing species abundance studies in ecology. These methods rely on distributional assumptions on the conditional abundance distributions. For abundance observations, the Poisson and the negative binomial distributions are the most frequently considered distributions. However, many large abundance studies result in many zero abundances. This may happen because of several reasons. To name one, in microbial community ecology the abundances of a very large number of species are nowadays often obtained by means of sequencing the pooled DNA sample. Due to the small sensitivity for rare species, too many observed zeroes are to be expected. Moreover, more zeroes are expected with increasing number of species. We propose a constrained ordination method based on zero-altered count distributions (e.g., zero-inflated Poisson, hurdle models). We show how the parameters and the environmental gradients can be estimated. In simulation studies we examine the behaviour of the estimators, and we apply the method to a real data set. We conclude that in the presence of zero inflation our methods give better results than the Poisson-based approaches.

MSC:

62-XX Statistics

Software:

R
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Cunningham, R, Lindenmayer, D (2005) Modelling count data of rare species: some statistical issues. Ecology, 86, 1135-42. · doi:10.1890/04-0589
[2] Firth, D (1993) Bias reduction of maximum likelihood estimates. Biometrika, 80, 27-38. · Zbl 0769.62021 · doi:10.1093/biomet/80.1.27
[3] Goodall, D, Johnson, R (1982) Non-linear ordination in several dimensions: a maximum likelihood approach. Vegetatio, 48, 197-208.
[4] Goodall, D, Johnson, R (1987) Maximum-likelihood ordination: some improvements and further tests. Vegetatio, 73, 3-12. · doi:10.1007/BF00031846
[5] Greenacre, M (2007) Correspondence analysis in practice. London: Chapman and Hall/CRC. · Zbl 1198.62061 · doi:10.1201/9781420011234
[6] Huse, S, Huber, J, Morrison, H, Sogin, M, Welch, D (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology, 8, R143. · doi:10.1186/gb-2007-8-7-r143
[7] Johnson, K, Altman, N (1999) Canonical correspondence analysis as an approximation to Gaussian ordination. Environmetrics, 10, 39-52. · doi:10.1002/(SICI)1099-095X(199901/02)10:1<39::AID-ENV334>3.0.CO;2-3
[8] Kunin, V, Engelbrektson, A, Ochman, H, Hugenholtz, P (2010) Wrinles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology, 12, 18123. · doi:10.1111/j.1462-2920.2009.02051.x
[9] Lambert, D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-13. · Zbl 0850.62756 · doi:10.2307/1269547
[10] Martin, T, Wintle, B, Rhodes, J, Kuhnert, P, Field, S, Low-Choy, S, Tyre, A, Possingham, H (2005) Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology Letters, 8, 1235-46. · doi:10.1111/j.1461-0248.2005.00826.x
[11] Martiny, J, Bohannan, B, Brown, J, Colwell, R, Fuhrman, J, Green, J, Horner-Devine, M, Kane, M, Krumins, J, Kusje, C, Morini, P, Naeem, S, Ovreas, L, Reysenbach, A, Smith, V, Staley, J (2006) Microbial biogeography: putting microorganisms on the map. Nature Reviews Microbiology, 4, 102-12. · doi:10.1038/nrmicro1341
[12] Mullahy, J (1986) Specification and testing of some modified count data models. Journal of Econometrics, 33, 341-65. · doi:10.1016/0304-4076(86)90002-3
[13] Pott, J, Elith, J (2006) Comparing species abundance models. Ecological Modelling, 199, 153-63. · doi:10.1016/j.ecolmodel.2006.05.025
[14] R Development Core Team (2011) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
[15] Takane, Y, Yanai, H, Mayekawa, S (1991) Relationships among several methods of linearly constrained correspondence analysis. Psychometrika, 56, 667-84. · Zbl 0760.62057 · doi:10.1007/BF02294498
[16] ter Braak, C (1986) Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology, 67, 1167-79. · doi:10.2307/1938672
[17] ter Braak, C (1987) The analysis of vegetation-environment relationships by canonical correspondence analysis. Vegetatio, 69, 69-77. · doi:10.1007/BF00038688
[18] ter Braak, C, Looman, C (1986) Weighted averaging, logistic regression and the Gaussian response model. Vegetatio, 65, 3-11. · doi:10.1007/BF00032121
[19] ter Braak, C, Verdonschot, P (1995) Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquatic Ecology, 57, 255-89.
[20] Väre, H, Ohtonen, R, Oksanen, J (1995) Effects of reindeer grazing on understorey vegetation in dry pinus sylvestris forests. Journal of Vegetation Science, 6, 523-30. · doi:10.2307/3236351
[21] Ver Hoef, J, Boveng, P (2007) Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? IEcology, 88, 2766-72.
[22] Verleyen, E, Hodgson, D, Gibson, J, Imura, S (2011) Chemical limnology in coastal east Antarctic lakes: monitoring future climate change in centres of endemism and biodiversity. Antarctic Science. Available on CJO 2011 doi:10.1017/S0954102011000642. · doi:10.1017/S0954102011000642.
[23] Welsh, A, Cunningham, R, Donnelly, C, Lindenmayer, D (1996) Modelling the abundance of rare species: statistical models with extra zeroes. Ecological Modelling, 88, 297-308. · doi:10.1016/0304-3800(95)00113-1
[24] Winkelmann, R (2008) Econometric analysis of count data, 5th edn. Berlin, Germany: Springer.
[25] Yee, T (2004) A new technique for maximum-likelihood canonical ordination. Ecological Monographs, 74, 685-701. · doi:10.1890/03-0078
[26] Yee, T, Hastie, T (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15-41. · Zbl 1195.62123
[27] Zhu, M, Hastie, T, Walther, G (2005) Constrained ordination analysis with flexible response functions. Ecological Modelling, 187, 524-36. · doi:10.1016/j.ecolmodel.2005.01.049
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.