×

Likelihood-based inference for discretely observed birth-death-shift processes, with applications to evolution of mobile genetic elements. (English) Zbl 1419.62481

Summary: Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements – important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M05 Markov processes: estimation; hidden Markov models
60J27 Continuous-time Markov processes on discrete state spaces
60J85 Applications of branching processes
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Bailey , N. T. J. 1964
[2] Biémont, A brief history of the status of transposable elements: From junk DNA to major players in evolution, Genetics 186 pp 1085– (2010) · doi:10.1534/genetics.110.124180
[3] Catlin, Statistical inference in a two-compartment model for hematopoiesis, Biometrics 57 pp 546– (2001) · Zbl 1209.62264 · doi:10.1111/j.0006-341X.2001.00546.x
[4] Cattamanchi, A 13-year molecular epidemiological analysis of tuberculosis in San Francisco, The International Journal of Tuberculosis and Lung Disease 10 pp 297– (2006)
[5] Crawford, Transition probabilities for general birth-death processes with applications in ecology, genetics, and evolution, Journal of Mathematical Biology 65 pp 553– (2012) · Zbl 1252.92053 · doi:10.1007/s00285-011-0471-z
[6] Crawford, Estimation for general birth-death processes, Journal of the American Statistical Association 109 pp 730– (2014) · Zbl 1367.62245 · doi:10.1080/01621459.2013.866565
[7] Dempster, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological) 39 pp 1– (1977) · Zbl 0364.62022
[8] Doss, Fitting birth-death processes to panel data with applications to bacterial DNA fingerprinting, The Annals of Applied Statistics 7 pp 2315– (2013) · Zbl 1283.92027 · doi:10.1214/13-AOAS673
[9] Gagneux, Variable host-pathogen compatibility in Mycobacterium tuberculosis, Proceedings of the National Academy of Sciences of the United States of America 103 pp 2869– (2006) · doi:10.1073/pnas.0511240103
[10] Golinelli, Bayesian inference in a hidden stochastic two-compartment model for feline hematopoiesis, Mathematical Medicine and Biology 23 pp 153– (2006) · Zbl 1098.62145 · doi:10.1093/imammb/dql008
[11] Guttorp , P. 1995
[12] Henrici, Fast Fourier methods in computational complex analysis, Siam Review 21 pp 481– (1979) · Zbl 0416.65022 · doi:10.1137/1021093
[13] Huber, Spatial birth-death swap chains, Bernoulli 18 pp 1031– (2012) · Zbl 1254.60080 · doi:10.3150/10-BEJ350
[14] Illian , J. Penttinen , A. Stoyan , H. Stoyan , D. 2008
[15] Kato-Maeda, Genotyping of Mycobacterium tuberculosis: Application in epidemiologic studies, Future Microbiology 6 pp 203– (2011) · doi:10.2217/fmb.10.165
[16] Keiding, Maximum likelihood estimation in the birth-and-death process, The Annals of Statistics 3 pp 363– (1975) · Zbl 0302.62043 · doi:10.1214/aos/1176343062
[17] Lange, Fitting and interpreting continuous-time latent Markov models for panel data, Statistics in Medicine 32 pp 4581– (2013) · doi:10.1002/sim.5861
[18] McEvoy, The role of IS6110 in the evolution of Mycobacterium tuberculosis, Tuberculosis 87 pp 393– (2007) · doi:10.1016/j.tube.2007.05.010
[19] Minin, Counting labeled transitions in continuous-time Markov models of evolution, Journal of Mathematical Biology 56 pp 391– (2008) · Zbl 1145.60323 · doi:10.1007/s00285-007-0120-8
[20] Renshaw , E. 2011
[21] Rosenberg, Estimating change rates of genetic markers using serial samples: Applications to the transposon IS6110 in Mycobacterium tuberculosis, Theoretical Population Biology 63 pp 347– (2003) · Zbl 1098.62147 · doi:10.1016/S0040-5809(03)00010-8
[22] Schwarz, Estimating the dimension of a model, The Annals of Statistics 6 pp 461– (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[23] Tanaka, Optimal estimation of transposition rates of insertion sequences for molecular epidemiology, Statistics in Medicine 20 pp 2409– (2001) · doi:10.1002/sim.910
[24] Van Embden, Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: Recommendations for a standardized methodology, Journal of Clinical Microbiology 31 pp 406– (1993)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.