×

Unrooted genealogical tree probabilities in the infinitely-many-sites model. (English) Zbl 0818.92010

Summary: The infinitely many-sites process is often used to model the sequence variability observed in samples of DNA sequences. Despite its popularity, the sampling theory of the process is rather poorly understood. We describe the tree structure underlying the model and show how this may be used to compute the probability of a sample of sequences. We show how to produce the unrooted genealogy from a set of sites in which the ancestral labeling is unknown and from this the corresponding rooted genealogies.
We derive recursions for the probability of the configuration of sequences (equivalently, of trees) in both the rooted and unrooted cases. We give a computational method based on Monte Carlo recursion that provides approximants to sampling probabilities for samples of any size. Among several applications, this algorithm may be used to find maximum likelihood estimators of the substitution rate, both when the ancestral labeling of sites is known and when it is unknown.

MSC:

92D10 Genetics and epigenetics
65C99 Probabilistic methods, stochastic differential equations
92D15 Problems related to evolution
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ethier, S. N.; Griffiths, R. C., The infinitely-many-sites model as a measure valued diffusion, Ann. Probab., 15, 414-545 (1987) · Zbl 0634.92007
[2] Felsenstein, J., Numerical methods for inferring evolutionary trees, Quart. Rev. Biol., 57, 379-404 (1982)
[3] Griffiths, R. C., An Algorithm for Constructing Genealogical Trees, (Statistics Research Report #163 (1987), Department of Mathematics, Monash University) · Zbl 0652.92011
[4] Griffiths, R. C., Genealogical-tree probabilities in the infinitely-many-site model, J. Math. Biol., 27, 667-680 (1989) · Zbl 0716.92012
[5] Griffiths, R. C.; Tavaré, S., Sampling theory for neutral alleles in a varying environment, Phil. Trans. Roy. Soc. London B, 344, 403-410 (1994)
[6] Griffiths, R. C.; Tavaré, S., Simulating probability distributions in the coalescent, Theor. Pop. Biol., 46, 131-159 (1994) · Zbl 0807.92015
[7] Gusfield, D., Efficient algorithms for inferring evolutionary trees, Networks, 21, 19-28 (1991) · Zbl 0719.92015
[8] Hudson, R. R., Gene genealogies and the coalescent process, (Futuyma, D.; Antonovics, J., Oxford Surveys in Evolutionary Biology, vol. 7 (1991), Oxford University Press), 1-44
[9] Kingman, J. F.C., On the genealogy of large populations, J. Appl. Probab., 19A, 27-43 (1982) · Zbl 0516.92011
[10] Strobeck, C., Estimation of the neutral mutation rate in a finite population from DNA sequence data, Theor. Pop. Biol., 24, 160-172 (1983) · Zbl 0517.92016
[11] S. Tavaré, Calibrating the clock: Using stochastic processes to measure the rate of evolution, in Molecular Biology and Mathematics, E. S. Lander, ed., National Academy Press, to appear.; S. Tavaré, Calibrating the clock: Using stochastic processes to measure the rate of evolution, in Molecular Biology and Mathematics, E. S. Lander, ed., National Academy Press, to appear.
[12] Watterson, G. A., On the number of segregating sites in genetical models without recombination, Theor. Pop. Biol., 7, 256-276 (1975) · Zbl 0294.92011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.