×

Best match graphs and reconciliation of gene trees with species trees. (English) Zbl 1434.05035

Summary: A wide variety of problems in computational biology, most notably the assessment of orthology, are solved with the help of reciprocal best matches. Using an evolutionary definition of best matches that captures the intuition behind the concept we clarify rigorously the relationships between reciprocal best matches, orthology, and evolutionary events under the assumption of duplication/loss scenarios. We show that the orthology graph is a subgraph of the reciprocal best match graph (RBMG). We furthermore give conditions under which an RBMG that is a cograph identifies the correct orthlogy relation. Using computer simulations we find that most false positive orthology assignments can be identified as so-called good quartets – and thus corrected – in the absence of horizontal transfer. Horizontal transfer, however, may introduce also false-negative orthology assignments.

MSC:

05C05 Trees
05C62 Graph representations (geometric and intersection representations, etc.)
05C90 Applications of graph theory
92B10 Taxonomy, cladistics, statistics in mathematical biology

Software:

OrthoMCL; eggNOG
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Altenhoff, Adrian M.; Boeckmann, Brigitte; Capella-Gutierrez, Salvador; Dalquen, Daniel A.; Deluca, Todd; Forslund, Kristoffer; Huerta-Cepas, Jaime; Linard, Benjamin; Pereira, Cécile; Pryszcz, Leszek P.; Schreiber, Fabian; Da Silva, Alan Sousa; Szklarczyk, Damian; Train, Clément-Marie; Bork, Peer; Lecompte, Odile; Von Mering, Christian; Xenarios, Ioannis; Sjölander, Kimmen; Jensen, Lars Juhl; Martin, Maria J.; Muffato, Matthieu; Gabaldón, Toni; Lewis, Suzanna E.; Thomas, Paul D.; Sonnhammer, Erik; Dessimoz, Christophe, Standardized benchmarking in the quest for orthologs, Nature Methods, 13, 5, 425-430 (2016)
[2] Altenhoff, Am; Studer, Ra; Robinson-Rechavi, M.; Dessimoz, C., Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs, PLoS Comp Biol, 8, e1002514 (2012)
[3] Bansal, M.; Alm, E.; Kellis, M., Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss, Bioinformatics, 28, i283-i291 (2012)
[4] Böcker, S.; Briesemeister, S.; Klau, Gw, Exact algorithms for cluster editing: evaluation and experiments, Algorithmica, 60, 316-334 (2011) · Zbl 1215.68169
[5] Böcker, S.; Dress, Awm, Recovering symbolically dated, rooted trees from symbolic ultrametrics, Adv Math, 138, 105-125 (1998) · Zbl 0912.05031
[6] Corneil, Dg; Lerchs, H.; Steward Burlingham, L., Complement reducible graphs, Discr Appl Math, 3, 163-174 (1981) · Zbl 0463.05057
[7] Dalquén, Da; Anisimova, M.; Gonnet, Gh; Dessimoz, C., ALF—A simulation framework for genome evolution, Mol Biol Evol, 29, 1115-1123 (2011)
[8] Datta, Rs; Meacham, C.; Samad, B.; Neyer, C.; Sjölander, K., Berkeley PHOG: PhyloFacts orthology group prediction web server, Nucleic Acids Res, 37, W84-W89 (2009)
[9] Dondi, R.; Lafond, M.; El-Mabrouk, N., Approximating the correction of weighted and unweighted orthology and paralogy relations, Algorithms Mol Biol, 12, 4 (2017)
[10] Doyon, Jp; Chauve, C.; Hamel, S., Space of gene/species trees reconciliations and parsimonious models, J Comp Biol, 16, 1399-1418 (2009)
[11] Doyon, Jp; Ranwez, V.; Daubin, V.; Berry, V., Models, algorithms and programs for phylogeny reconciliation, Brief Bioinform, 12, 392-400 (2011)
[12] Doyon, Jp; Scornavacca, C.; Gorbunov, Ky; Szöllősi, Gj; Ranwez, V.; Berry, V.; Tannier, E., An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers, Comparative genomics: international workshop, RECOMB-CG 2010, 93-108 (2010), Berlin: Springer, Berlin
[13] Dufayard, Jf; Duret, L.; Penel, S.; Gouy, M.; Rechenmann, F.; Perriere, G., Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases, Bioinformatics, 21, 2596-2603 (2005)
[14] Ehrenfeucht, A.; Rozenberg, G., Theory of 2-structures, part I: clans, basic subclasses, and morphisms, Theor Comp Sci, 70, 277-303 (1990) · Zbl 0701.05051
[15] Ehrenfeucht, A.; Rozenberg, G., Theory of 2-structures, part II: representation through labeled tree families, Theor Comp Sci, 70, 305-342 (1990) · Zbl 0701.05052
[16] Fitch, Wm, Distinguishing homologous from analogous proteins, Syst Zool, 19, 99-113 (1970)
[17] Fitch, Wm, Homology: a personal view on some of the problems, Trends Genet, 16, 227-231 (2000)
[18] Gabaldón, T.; Koonin, Ev, Functional and evolutionary implications of gene orthology, Nat Rev Genet, 14, 360-366 (2013)
[19] Geiß, M.; Anders, J.; Stadler, Pf; Wieseke, N.; Hellmuth, M., Reconstructing gene trees from Fitch’s xenology relation, J Math Biol, 77, 1459-1491 (2018) · Zbl 1396.05025
[20] Geiß, M.; Chávez, E.; González Laffitte, M.; López Sánchez, A.; Stadler, Bmr; Valdivia, Di; Hellmuth, M.; Hernández Rosales, M.; Stadler, Pf, Best match graphs, J Math Biol, 78, 2015-2057 (2019) · Zbl 1415.92133
[21] Geiß, Manuela; Stadler, Peter F.; Hellmuth, Marc, Reciprocal best match graphs, Journal of Mathematical Biology, 80, 3, 865-953 (2019) · Zbl 1433.05303
[22] Górecki, P.; Tiuryn, J., DLS-trees: a model of evolutionary scenarios, Theor Comp Sci, 359, 378-399 (2006) · Zbl 1097.68053
[23] Guigó, R.; Muchnik, I.; Smith, Tf, Reconstruction of ancient molecular phylogeny, Mol Phylogenet Evol, 6, 189-213 (1996)
[24] Hellmuth, M., Biologically feasible gene trees, reconciliation maps and informative triples, Alg Mol Biol, 12, 23 (2017)
[25] Hellmuth, M.; Hernandez-Rosales, M.; Huber, Kt; Moulton, V.; Stadler, Pf; Wieseke, N., Orthology relations, symbolic ultrametrics, and cographs, J Math Biol, 66, 399-420 (2013) · Zbl 1408.05038
[26] Hellmuth, M.; Huber, K.; Moulton, V., Reconciling event-labeled gene trees with MUL-trees and species networks, J Math Biol, 79, 1885-1925 (2019) · Zbl 1423.05185
[27] Hellmuth, M.; Seemann, Cr, Alternative characterizations of Fitch’s xenology relation, J Math Biol, 79, 969-986 (2019) · Zbl 1420.92082
[28] Hellmuth, M.; Stadler, Pf; Wieseke, N., The mathematics of xenology: Di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations, J Math Biol, 75, 299-237 (2017) · Zbl 1368.05023
[29] Hellmuth, Marc; Wieseke, Nicolas, From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees, Evolutionary Biology, 373-392 (2016), Cham: Springer International Publishing, Cham
[30] Hellmuth, M.; Wieseke, N.; Lechner, M.; Lenhof, Hp; Middendorf, M.; Stadler, Pf, Phylogenomics with paralogs, Proc Natl Acad Sci USA, 112, 2058-2063 (2015)
[31] Hernandez-Rosales, M.; Hellmuth, M.; Wieseke, N.; Huber, Kt; Moulton, V.; Stadler, Pf, From event-labeled gene trees to species trees, BMC Bioinform, 13, S6 (2012)
[32] Hoàng, Ct; Kamiński, M.; Sawada, J.; Sritharan, R., Finding and listing induced paths and cycles, Discr Appl Math, 161, 633-641 (2013) · Zbl 1259.05094
[33] Innan, H.; Kondrashov, F., The evolution of gene duplications: classifying and distinguishing between models, Nat Rev Genet, 11, 97-108 (2010)
[34] Jamison, B.; Olariu, S., Recognizing \(P_4\)-sparse graphs in linear time, SIAM J Comput, 21, 381-406 (1992) · Zbl 0763.05093
[35] Jensen, Lj; Julien, P.; Kuhn, M.; Von Mering, C.; Muller, J.; Doerks, T.; Bork, P., eggNOG: automated construction and annotation of orthologous groups of genes, Nucleic Acids Res, 36, D250-D2504 (2008)
[36] Keller-Schmidt, S.; Klemm, K., A model of macroevolution as a branching process based on innovations, Adv Complex Syst, 15, 1250043 (2012)
[37] Koonin, E., Orthologs, paralogs, and evolutionary genomics, Ann Rev Genet, 39, 309-338 (2005)
[38] Kuhn, Ts; Mooers, Aø; Thomas, Gh, A simple polytomy resolver for dated phylogenies, Methods Ecol Evo, 2, 427-436 (2011)
[39] Lafond, M.; Dondi, R.; El-Mabrouk, N., The link between orthology relations and gene trees: a correction perspective, Algorithms Mol Biol, 11, 4 (2016)
[40] Lafond, M.; El-Mabrouk, N., Orthology and paralogy constraints: satisfiability and consistency, BMC Genom, 15, S12 (2014)
[41] Lechner, M.; Hernandez-Rosales, M.; Doerr, D.; Wieseke, N.; Thévenin, A.; Stoye, J.; Hartmann, Rk; Prohaska, Sj; Stadler, Pf, Orthology detection combining clustering and synteny for very large datasets, PLoS ONE, 9, e105015 (2014)
[42] Li, L.; Stoeckert, Cj Jr; Roos, Ds, OrthoMCL: identification of ortholog groups for eukaryotic genomes, Genome Res, 13, 2178-2189 (2003)
[43] Liu, Y.; Wang, J.; Guo, J.; Chen, J., Complexity and parameterized algorithms for cograph editing, Theor Comp Sci, 461, 45-54 (2012) · Zbl 1253.68179
[44] Nichio, Btl; Marchaukoski, Jn; Raittz, Rt, New tools in orthology analysis: a brief review of promising perspectives, Front Genet, 8, 165 (2017)
[45] Nøjgaard, N.; Geiß, M.; Merkle, D.; Stadler, Pf; Wieseke, N.; Hellmuth, M., Time-consistent reconciliation maps and forbidden time travel, Alg Mol Biol, 13, 2 (2018)
[46] Page, Rdm; Charleston, Ma, Reconciled trees and incongruent gene and species trees, DIMACS Ser Discrete Mathematics and Theor Comput Sci, 37, 57-70 (1997) · Zbl 0892.92011
[47] Purvis, A.; Garland, T. Jr, Polytomies in comparative analyses of continuous characters, Syst Biol, 42, 569-575 (1993)
[48] Roth, Acj; Gonnet, Gh; Dessimoz, C., Algorithm of OMA for large-scale orthology inference, BMC Bioinform, 9, 518 (2008)
[49] Rusin, Ly; Lyubetskaya, E.; Gorbunov, Ky; Lyubetsky, V., Reconciliation of gene and species trees, BioMed Res Int, 2014, 642089 (2014)
[50] Sayyari, E.; Mirarab, S., Testing for polytomies in phylogenetic species trees using quartet frequencies, Genes, 9, E132 (2018)
[51] Sonnhammer, E. L. L.; Gabaldon, T.; Sousa Da Silva, A. W.; Martin, M.; Robinson-Rechavi, M.; Boeckmann, B.; Thomas, P. D.; Dessimoz, C., Big data and other challenges in the quest for orthologs, Bioinformatics, 30, 21, 2993-2998 (2014)
[52] Stadler PF, Geiß M, Schaller D, López A, Gonzalez Laffitte M, Valdivia D, Hellmuth M, Hernandez Rosales M (2020) From best hits to best matches. Tech Rep 2001.00958, arXiv · Zbl 07340472
[53] Storm, Ce; Sonnhammer, El, Automated ortholog inference from phylogenetic trees and calculation of orthology reliability, Bioinformatics, 18, 92-99 (2002)
[54] Studer, Ra; Robinson-Rechavi, M., How confident can we be that orthologs are similar, but paralogs differ?, Trends Genet, 25, 210-216 (2009)
[55] Tatusov, Rl; Koonin, Ev; Lipman, Dj, A genomic perspective on protein families, Science, 278, 631-637 (1997)
[56] Tofigh, A.; Hallett, M.; Lagergren, J., Simultaneous identification of duplications and lateral gene transfers, IEEEACM Trans Comput Biol Bioinform, 8, 517-535 (2011)
[57] Vernot, B.; Stolzer, M.; Goldman, A.; Durand, D., Reconciliation with non-binary species trees, J Comput Biol, 15, 981-1006 (2008)
[58] Vilella, Aj; Severin, J.; Ureta-Vidal, A.; Heng, L.; Durbin, R.; Birney, E., EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates, Genome Res, 19, 327-335 (2009)
[59] Zallot, R.; Harrison, Kj; Kolaczkowski, B.; De Crécy-Lagard, V., Functional annotations of paralogs: a blessing and a curse, Life, 6, 39 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.