×

A general framework for gene tree correction based on duplication-loss reconciliation. (English) Zbl 1443.92128

Schwartz, Russell (ed.) et al., 17th international workshop on algorithms in bioinformatics, WABI 2017, Boston, MA, USA, August 21–23, 2017. Proceedings. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik. LIPIcs – Leibniz Int. Proc. Inform. 88, Article 8, 14 p. (2017).
Summary: Due to the key role played by gene trees and species phylogenies in biological studies, it is essential to have as much confidence as possible on the available trees. As phylogenetic tools are error-prone, it is a common task to use a correction method for improving an initial tree. Various correction methods exist. In this paper we focus on those based on the duplication-loss reconciliation model. The polytomy resolution approach consists in contracting weakly supported branches and then refining the obtained non-binary tree in a way minimizing a reconciliation distance with the given species tree. On the other hand, the supertree approach takes as input a set of separated subtrees, either obtained for separated orthology groups or by removing the upper branches of an initial tree to a certain level, and amalgamating them in an optimal way preserving the topology of the initial trees. The two classes of problems have always been considered as two separate fields, based on apparently different models. In this paper we give a unifying view showing that these two classes of problems are in fact special cases of a more general problem that we call LabelGTC, whose input includes a \(0-1\) edge-labelled gene tree to be corrected. Considering a tree as a set of triplets, we also formulate the TripletGTC problem whose input includes a set of gene triplets that should be preserved in the corrected tree. These two general models allow to unify, understand and compare the principles of the duplication-loss reconciliation-based tree correction approaches. We show that LabelGTC is a special case of TripletGTC. We then develop appropriate algorithms allowing to handle these two general correction problems.
For the entire collection see [Zbl 1372.68022].

MSC:

92D15 Problems related to evolution
92D10 Genetics and epigenetics

Software:

TreeFix; ecceTERA
PDFBibTeX XMLCite
Full Text: DOI

References:

[2] .If all edges of E are labelled 0, then by Lemma 7, all the edges of E are terminal edges. Since E covers {\it T }, then all edges of {\it E}({\it T }) {\it E}(T ) are labelled 0. So, by Theorem 1, the problem is reduced to SGT.
[3] .{\it T rp}1 =S{{\it T rp}({\it T}{\it i}) | {\it T}{\it i}∈ T } ;
[4] B. Boussau, G.J. Szöllősi, L. Duret, M. Gouy, E. Tannier., and V. Daubin. Genome-scale coestimation of species and gene trees. {\it Genome Research}, 23:323-330, 2013.
[5] K. Chen, D. Durand, and M. Farach-Colton. Notung: Dating gene duplications using gene family trees. {\it Journal of Computational Biology}, 7:429-447, 2000.
[6] .{\it T rp}2 = {{\it T rp}({\it x, y}) | ({\it x, y}) {\it is a terminal edge of T }} ;
[7] .Otherwise, let {\it T}0= {\it LabelGT C}({\it S, T, }T {\it , l}), and for any edge ({\it x}{\it i}{\it , y}{\it i}) of E labelled 1 and pre served in {\it T}0, let {\it y}{\it i}0 be the node of {\it T}0 such that L({\it y}{\it i}0) = L({\it y}{\it i}). Since {\it T}|L({\it y} {\it i})is a complete subtree of {\it T }, then {\it T}0[{\it y}0{\it i}] is also a complete subtree of {\it T}0 that can be constructed inde pendently of the remaining of the tree {\it T}0 as {\it T}0[{\it y}{\it i}0] = {\it LabelGT C}({\it S, T }[{\it y}{\it i}]{\it , }T|L({\it y}{\it i}){\it , l}|{\it E}({\it T }[{\it y}{\it i}])) (computed at Step 2.a. of the algorithm). Next, the obtained subtrees must be adequately grafted on branches of a supergenetree of the remaining trees {{\it T }[{\it y}{\it i}] ∈ T | ({\it x}{\it i}{\it , y}{\it i}) ∈ E {\it and l}({\it x}{\it i}{\it , y}{\it i}) = 0}. In this case, the problem is also reduced to the SGT problem on T∗= {{\it T }[{\it y}{\it i}] : ({\it x}{\it i}{\it , y}{\it i}) ∈ E {\it and l}({\it x}{\it i}{\it , y}{\it i}) = 0} ∪ {{\it y}∗{\it i}: ({\it x}{\it i}{\it , y}{\it i}) ∈ E {\it and l}({\it x}{\it i}{\it , y}{\it i}) = 0} where each tree {\it T}0[{\it y}{\it i}0] computed at Step 2.a. is contracted into a single leaf node {\it y}{\it i}∗associated to the leafset L({\it T}0[{\it y}0{\it i}]) (computed at Steps 2.b. to 2.e. of the algorithm). The worst case of the algorithm is the stop case where it is directly reduced to the SGT algorithm whose time complexity is in {\it O}(4{\it k}{\it .}({\it n }+ 1){\it k}{\it .k}).J
[8] .{\it T rp}3 = {{\it T rp}({\it x, y}) | ({\it x, y}) {\it is a non terminal edge of T }}. So a tree {\it T}0 for L({\it T }) preserves the topology of all trees {\it T}{\it i}∈ T iff it preserves {\it T rp}1 ; It preserves all terminal edges of {\it E}({\it T }) iff it preserves {\it T rp}2 and it preserves all non-terminal edges of {\it E}({\it T }) iff {\it T rp}3. The combination of the inclusion or exclusion of {\it T rp}1, {\it T rp}2 and {\it T rp}3 in {\it T rp }(except the case where {\it T rp }= ∅) results in the 7 cases of the theorem.J
[9] J. Felsenstein. Phylogenies from molecular sequences: Inference and reliability. {\it Ann. Review} {\it Genet.}, 22:521-565, 1988.
[10] E. Jacox, C. Chauve, G.J. Szollosi, Y. Ponty, and C. Scornavacca. ecceTERA: compre hensive gene tree-species tree reconciliation using parsimony. {\it Bioinformatics}, 32(13):2056 2058, 2016.
[11] M. Lafond, E. Noutahi, and N. El-Mabrouk. Efficient non-binary gene tree resolution with weighted reconciliation cost. In {\it Combinatorial Pattern Matching, LIPIcs-Leibniz In-} {\it ternational Proceedings in Informatics}, volume 54. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. · Zbl 1380.68484
[12] M. Lafond, A. Ouangraoua, and N. El-Mabrouk. Reconstructing a supergenetree minimiz ing reconciliation. {\it BMC-Genomics}, 16:S4, 2015. Special issue of RECOMB-CG 2015.
[13] N. El-Mabrouk M. Lafond, C. Chauve and A. Ouangraoua. Gene tree construction and correction using supertree and reconciliation. In {\it Asia Pacific Bioinformatics Conference}, 2017. soon in IEEE/ACM TCBB.
[14] :13
[15] N. Nguyen, S. Mirarab, and T. Warnow. MRL and SuperFine+MRL: new supertree meth ods. {\it J. Algo. for Mol. Biol.}, 7(3), 2012.
[16] :14
[17] Thi Hau Nguyen, Vincent Ranwez, Stéphanie Pointet, Anne-Muriel Arigon Chifolleau, Jean Philippe Doyon, and Vincent Berry. Reconciliation and local gene tree rearrangement can be of mutual profit. {\it Algorithms Mol Biol}, 8(1):12, 2013. doi:10.1186/1748-7188-8-12.
[18] E. Noutahi, M. Semeria, M. Lafond, J. Seguin, L. Gueguen, N. El-Mabrouk, and E. Tannier. Efficient gene tree correction guided by genome evolution. {\it Plos.One}, 11(8), 2016.
[19] M. D. Rasmussen and M. Kellis. A bayesian approach for fast and accurate gene tree reconstruction. {\it Molecular Biology and Evolution}, 28(1):273- 290, 2011.
[20] Jamal S. M. Sabir, Robert K. Jansen, Dhivya Arasappan, Virginie Calderon, Emmanuel Noutahi, Chunfang Zheng, Seongjun Park, Meshaal J. Sabir, Mohammed N. Baeshen, Nahid H. Hajrah, Mohammad A. Khiyami, Nabih A. Baeshen, Abdullah Y. Obaid, Ab dulrahman L. Al-Malki, David Sankoff, Nadia El-Mabrouk, and Tracey A. Ruhlman. The nuclear genome of Rhazya stricta and the evolution of alkaloid diversity in a medically relevant clade of apocynaceae. {\it Nature Scientific Reports}, 6(33782), 2016.
[21] F. Schreiber, M. Patricio, M. Muffato, M. Pignatelli, and A. Bateman. Treefam v9: a new website, more species and orthology-on-the-fly. {\it Nucleic Acids Research}, 2013. doi: 10.1093/nar/gkt1055.
[22] C. Scornavacca, L. van Iersel, S. Kelk, and D. Bryant. The agreement problem for unrooted phylogenetic trees is FPT. {\it Journal of Graph Algorithms and Applications}, 18(3):385 - 392, 2014. · Zbl 1295.05239
[23] A. J. Vilella, J. Severin, A. Ureta-Vidal, L. Heng, R. Durbin, and E. Birney. EnsemblCom para gene trees: Complete, duplication-aware phylogenetic trees in vertebrates. {\it Genome} {\it Research}, 19:327-335, 2009.
[24] Y. C. Wu, M. D. Rasmussen, M. S. Bansal, and M. Kellis. TreeFix: Statistically informed gene tree error correction using species trees. {\it Systematic Biology}, 62(1):110- 120, 2013.
[25] Y. Zheng and L. Zhang. Reconciliation with non-binary gene trees revisited. In {\it Lecture} {\it Notes in Computer Science}, volume 8394, pages 418-432, 2014. Proceedings of RECOMB.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.