×

Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models. (English) Zbl 1435.92043

The paper presents the first results of counting and sampling evolutionary scenarios in models that take into account duplication of genes, loss of genes, and HGT. The originality of the work lies in the fact that authors believe that only a species tree is given, and thus consider all possible evolutionary histories of a given size, i.e., leading to a given number of genes.
The results include formal grammars describing this combinatorial space, as well as counting and sampling algorithms obtained using dynamic programming methods or combinatorics. On the basis of the proposed method, the exact asymptotic is obtained by the number of stories for two specific tree species, a rooted caterpillar and a complete binary tree in the DL model without markup, although the method is also applicable to any given tree of species in this model. Counting and sampling algorithms have complemented these results for other models, especially models that take into account HGT.
The experimental results give the first global view of the potential space of evolutionary stories for a given tree species. They confirm the expected fact that the introduction of HGT in the model leads to a significant increase in the space of possible stories; they also lead to an interesting observation that in a ranked DLT model, the total number of stories is asymptotically almost independent of a given species tree.

MSC:

92D15 Problems related to evolution
05A15 Exact enumeration problems, generating functions
05A16 Asymptotic enumeration
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Åkerborg, Ö.; Sennblad, B.; Arvestad, L.; Lagergren, J., Simultaneous Bayesian gene tree reconstruction and reconciliation analysis, Proc Natl Acad Sci, 106, 14, 5714-5719 (2009)
[2] Arvestad, L.; Lagergren, J.; Sennblad, B., The gene evolution model and computing its associated probabilities, J ACM, 56, 2, 7:1-7:44 (2009) · Zbl 1325.92064
[3] Ban Chan, Y.; Ranwez, V.; Scornavacca, C., Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations, J Theor Biol, 432, 1-13 (2017) · Zbl 1393.92033
[4] Banderier C, Wallner M (2016) Lattice paths with catastrophes. Discrete Math Theor Comput Sci 19(1), Sept. 2017. Full version of extended abstract with the same title appeared in the proceedings of conference on random generation of combinatorial structures—GASCom · Zbl 1400.05022
[5] Bansal, Ms; Alm, Ej; Kellis, M., Reconciliation revisited: handling multiple optima when reconciling with duplication, transfer, and loss, J Comput Biol, 20, 10, 738-754 (2013)
[6] Bansal, Ms; Kellis, M.; Kordi, M.; Kundu, S., Ranger-dtl 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss, Bioinformatics, 34, 18, 3214-3216 (2018)
[7] Bendkowski M, Bodini O, Dovgal S (2018) Polynomial tuning of multiparametric combinatorial samplers. In: Proceedings of the fifteenth workshop on analytic algorithmics and combinatorics, ANALCO 2018, New Orleans, LA, USA, 8-9 Jan 2018. SIAM, pp 92-106 · Zbl 1429.68153
[8] Bodini O, Ponty Y (2010) Multi-dimensional Boltzmann sampling of languages. In: Drmota M, Gittenberger B (eds) 21st International meeting on probabilistic, combinatorial, and asymptotic methods in the analysis of algorithms (AofA’10), volume DMTCS proceedings, vol. AM, pp 49-64, Vienna, Austria, June. Discrete Mathematics and Theoretical Computer Science · Zbl 1355.68151
[9] Bodini, O.; Gardy, D.; Gittenberger, B.; Gołębiewski, Z., On the number of unary-binary tree-like structures with restrictions on the unary height, Ann Comb, 22, 1, 45-91 (2018) · Zbl 1384.05034
[10] Bóna, M.; Flajolet, P., Isomorphism and symmetries in random phylogenetic trees, J Appl Probab, 46, 4, 1005-1019 (2009) · Zbl 1231.60009
[11] Degnan, Jh; Rosenberg, Na, Discordance of species trees with their most likely gene trees, PLOS Genet, 2, 5, 1-7 (2006)
[12] Degnan, J.; Rosenberg, N., Gene tree discordance, phylogenetic and the multispecies coalescent, Trends Ecol Evolut, 24, 332-340 (2009)
[13] Degnan, Jh; Salter, La, Gene tree distribution under the coalescent process, Evolution, 59, 1, 24-37 (2005)
[14] Degnan, Jh; Rosenberg, Na; Stadler, T., A characterization of the set of species trees that produce anomalous ranked gene trees, IEEE/ACM Trans Comput Biol Bioinform, 9, 6, 1558-1568 (2012)
[15] Disanto, F.; Munarini, E., Local height in weighted dyck models of random walks and the variability of the number of coalescent histories for caterpillar-shaped gene trees and species trees, SN Appl Sci, 1, 6, 578 (2019)
[16] Disanto, F.; Rosenberg, Na, On the number of ranked species trees producing anomalous ranked gene trees, IEEE/ACM Trans Comput Biol Bioinform, 11, 6, 1229-1238 (2014)
[17] Disanto, F.; Rosenberg, Na, Coalescent histories for lodgepole species trees, J Comput Biol, 22, 10, 918-929 (2015)
[18] Disanto, F.; Rosenberg, Na, Asymptotic properties of the number of matching coalescent histories for caterpillar-like families of species trees, IEEE/ACM Trans Comput Biol Bioinform, 13, 5, 913-925 (2016)
[19] Disanto, F.; Rosenberg, Na, Enumeration of ancestral configurations for matching gene trees and species trees, J Comput Biol, 24, 9, 831-850 (2017)
[20] Disanto, F.; Rosenberg, Na, On the number of non-equivalent ancestral configurations for matching, Bull Math Biol, 81, 384-407 (2019) · Zbl 1410.92072
[21] Disanto, F.; Rosenberg, Na, Enumeration of compact coalescent histories for matching gene trees and species trees, J Math Biol, 78, 155-188 (2019) · Zbl 1407.05129
[22] Disanto, F.; Miglionico, P.; Narduzzi, G., On the unranked topology of maximally probable ranked gene tree topologies, J Math Biol, 79, 4, 1205-1225 (2019) · Zbl 1423.05045
[23] Doyon, J-P; Chauve, C.; Hamel, S., Space of gene/species trees reconciliations and parsimonious models, J Comput Biol, 16, 10, 1399-1418 (2009)
[24] Doyon, J-P; Ranwez, V.; Daubin, V.; Berry, V., Models, algorithms and programs for phylogeny reconciliation, Brief Bioinform, 12, 5, 392-400 (2011)
[25] Drmota, M., Systems of functional equations, Random Struct Algorithms, 10, 1-2, 103-124 (1997) · Zbl 0869.39010
[26] Du P, Nakhleh L (2018) Species tree and reconciliation estimation under a duplication-loss-coalescence model. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, BCB’18. ACM, pp 376-385
[27] Durand, D.; Halldórsson, Bv; Vernot, B., A hybrid micro-macroevolutionary approach to gene tree reconstruction, J Comput Biol, 13, 2, 320-335 (2006)
[28] Flajolet, P.; Sedgewick, R., Analytic combinatorics (2009), Cambridge: Cambridge University Press, Cambridge · Zbl 1165.05001
[29] Flajolet, P.; Zimmermann, P.; Van Cutsem, B., A calculus for the random generation of labelled combinatorial structures, Theor Comput Sci, 132, 1-2, 1-35 (1994) · Zbl 0799.68143
[30] Flajolet P, Sipala P, Steyaert JM (1990) Analytic variations on the common subexpression problem. In: Automata, languages and programming (Coventry, 1990), volume 443 of lecture notes in computer science. Springer, New York, pp 220-234 · Zbl 0765.68048
[31] Gavryushkin, A.; Drummond, Aj, The space of ultrametric phylogenetic trees, J Theor Biol, 403, 197-208 (2016) · Zbl 1343.92343
[32] Gavryushkin, A.; Whidden, C.; Matsen, Fa, The combinatorics of discrete time-trees: theory and open problems, J Math Biol, 76, 5, 1101-1121 (2018) · Zbl 1391.92030
[33] Gittenberger, B.; Jin, Ey; Wallner, M., On the shape of random Pólya structures, Discrete Math, 341, 4, 896-911 (2018) · Zbl 1380.05180
[34] Goodman, M.; Czelusniak, J.; Moore, Gw; Romero-Herrera, Ae; Matsuda, G., Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences, Syst Zool, 28, 2, 132-163 (1979)
[35] Górecki, P.; Eulenstein, O., Drml: probabilistic modeling of gene duplications, J Comput Biol, 21, 1, 89-98 (2014)
[36] Górecki, P.; Tiuryn, J., DLS-trees: a model of evolutionary scenarios, Theor Comput Sci, 359, 1, 378-399 (2006) · Zbl 1097.68053
[37] Górecki, P.; Burleigh, Gj; Eulenstein, O., Maximum likelihood models and algorithms for gene tree evolution with duplications and losses, BMC Bioinform, 12, 1, S15 (2011)
[38] Hasić, D.; Tannier, E., Gene tree species tree reconciliation with gene conversion, J Math Biol, 78, 6, 1981-2014 (2019) · Zbl 1415.92134
[39] Jacox, E.; Chauve, C.; Szöllősi, Gj; Ponty, Y.; Scornavacca, C., eccetera: comprehensive gene tree-species tree reconciliation using parsimony, Bioinformatics, 32, 13, 2056-2058 (2016)
[40] Maddison, Wp, Gene trees in species trees, Syst Biol, 46, 3, 523-536 (1997)
[41] Nijenhuis, A.; Wilf, Hs, Combinatorial algorithms (1978), New York: Academic Press, New York
[42] OEIS Foundation Inc. (2020) The On-Line Encyclopedia of Integer Sequences. http://oeis.org
[43] Ovadia, Y.; Fielder, D.; Conow, C.; Libeskind-Hadas, R., The cophylogeny reconstruction problem is NP-complete, J Comput Biol, 18, 1, 59-65 (2011)
[44] Pei, J.; Wu, Y., STELLS2: fast and accurate coalescent-based maximum likelihood inference of species trees from gene tree topologies, Bioinformatics, 33, 12, 1789-1797 (2017)
[45] Ranwez, V.; Scornavacca, C.; Doyon, J-P; Berry, V., Inferring gene duplications, transfers and losses can be done in a discrete framework, J Math Biol, 72, 7, 1811-1844 (2016) · Zbl 1337.92144
[46] Rasmussen, Md; Kellis, M., Unified modeling of gene duplication, loss, and coalescence using a locus tree, Genome Res, 22, 4, 755-765 (2012)
[47] Rosenberg, Na, Counting coalescent histories, J Comput Biol, 14, 3, 360-377 (2007)
[48] Rosenberg, Na, Enumeration of lonely pairs of gene trees and species trees by means of antipodal cherries, Adv Appl Math, 102, 1-17 (2019) · Zbl 1401.05025
[49] Scornavacca, C.; Jacox, E.; Szöllősi, Gj, Joint amalgamation of most parsimonious reconciled gene trees, Bioinformatics, 31, 6, 841-848 (2015)
[50] Sjöstrand, J.; Tofigh, A.; Daubin, V.; Arvestad, L.; Sennblad, B.; Lagergren, J., A bayesian method for analyzing lateral gene transfer, Syst Biol, 63, 3, 409-420 (2014)
[51] Stolzer, M.; Lai, H.; Xu, M.; Sathaye, D.; Vernot, B.; Durand, D., Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees, Bioinformatics, 28, 18, i409-i415 (2012)
[52] Szöllősi, Gergely J.; Daubin, Vincent, Modeling Gene Family Evolution and Reconciling Phylogenetic Discord, Methods in Molecular Biology, 29-51 (2012), Totowa, NJ: Humana Press, Totowa, NJ
[53] Szöllősi, Gj; Rosikiewicz, W.; Boussau, B.; Tannier, E.; Daubin, V., Efficient exploration of the space of reconciled gene trees, Syst Biol, 62, 6, 901-912 (2013)
[54] Szöllősi, Gj; Tannier, E.; Lartillot, N.; Daubin, V., Lateral gene transfer from the dead, Syst Biol, 62, 3, 386-397 (2013)
[55] Szöllősi, Gj; Tannier, E.; Daubin, V.; Boussau, B., The inference of gene trees with species trees, Syst Biol, 64, 1, e42-e62 (2015)
[56] Tofigh, A.; Hallett, Mt; Lagergren, J., Simultaneous identification of duplications and lateral gene transfers, IEEE/ACM Trans Comput Biol Bioinform, 8, 2, 517-535 (2011)
[57] Wilf, Hs, A unified setting for sequencing, ranking, and selection algorithms for combinatorial objects, Adv Math, 24, 3, 281-291 (1977) · Zbl 0354.05041
[58] Wu, Y., Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood, Evolution, 66, 3, 763-775 (2012)
[59] Wu, Y., An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree, Bioinformatics, 32, 12, i225-i233 (2016)
[60] Wu, Y-C; Rasmussen, Md; Kellis, M., Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees, Genome Res, 24, 3, 475-486 (2014)
[61] Zhang, Bo; Wu, Yi-Chieh, Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model, Bioinformatics Research and Applications, 196-210 (2017), Cham: Springer International Publishing, Cham
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.