A note on marginal count distributions for diversity estimation.

Choudhary, Pankaj K. (ed.) et al., Ordered data analysis, modeling and health research methods. In honor of H. N. Nagaraja’s 60th birthday. Selected papers based on the presentations at the international conference, Austin, TX, USA, March 7–9, 2014. Cham: Springer (ISBN 978-3-319-25431-9/hbk; 978-3-319-25433-3/ebook). Springer Proceedings in Mathematics & Statistics 149, 147-153 (2015).

Summary: Our problem is to estimate the total number of classes in a population, both observed and unobserved. This is often called the species problem, where the classes are (biological) species, but the same methods apply to “single source” capture-recapture, where only the number of captures for each individual is available (as opposed to the complete capture history). The data is summarized by the frequency counts, i.e., the number of classes observed exactly once, twice, three times, and so on, in the sample. Almost every known statistical procedure uses a mixed Poisson distribution to model the frequency counts, which assumes that the class sizes were independently generated from some latent or underlying mixing distribution, and that the classes independently contributed members to the sample. To depart from these assumptions we require different marginal distributions for the frequency counts. Here we consider distributions having probability generating functions based on generalized hypergeometric functions, first proposed by A. W. Kemp in [Sankhyā, Ser. A 30, 401–410 (1968; Zbl 0186.53004)]. We show that many of these are not mixed Poisson, and are useful and valuable in the species problem.

##### MSC:

92D15 | Problems related to evolution |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

62F12 | Asymptotic properties of parametric estimators |

92B15 | General biostatistics |

