×

Counting, generating, analyzing and sampling tree alignments. (English) Zbl 1403.05013

Summary: Pairwise ordered tree alignment are combinatorial objects that appear in important applications, such as RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e., two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis.
In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by mean of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees \(S\) and \(T\). By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal alignments.

MSC:

05A15 Exact enumeration problems, generating functions
05C90 Applications of graph theory
92D99 Genetics and population dynamics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abboud, A.; Williams, V. V.; Weimann, O.; Esparza, J.; Fraigniaud, P.; Husfeldt, T.; Koutsoupias, E., consequences of faster alignment of sequences, Automata, Languages, and Programming: 41st International Colloquium, ICALP 2014, 39-51, (2014), Springer, Berlin Heidelberg · Zbl 1409.68348
[2] Andrade, H.; Area, I.; Nieto, J. J.; Torres, A., The number of reduced alignments between two DNA sequences, BMC Bioinformatics, 15, 94, (2014)
[3] Blin, G.; Denise, A.; Dulucq, S.; Herrbach, C.; Touzet, H., Alignments of RNA structures, IEEE/ACM Trans. Comput. Biology Bioinform., 7, 2, 309-322, (2010)
[4] Chauve, C.; Courtiel, J.; Ponty, Y.; Botón-Fernández, M.; Martín-Vide, C.; Santander-Jiménez, S.; Vega-Rodríguez, M. A., counting, generating and sampling tree alignments, Algorithms for Computational Biology - Third International Conference, AlCoB 2016, 9702, 53-64, (2016), Springer · Zbl 1346.92048
[5] Denise, A.; Ponty, Y.; Termier, M., Controlled non-uniform random generation of decomposable structures, Theoret. Comput. Sci., 411, 40-42, 3527-3552, (2010) · Zbl 1273.05232
[6] Do, C.; Gross, S.; Batzoglou, S.; Apostolico, A.; Guerra, C.; Istrail, S.; Pevzner, P.; Waterman, M., Research in Computational Molecular Biology, 3909, Contralign: discriminative training for protein sequence alignment, 160-174, (2006), Springer, Berlin Heidelberg · Zbl 1302.92098
[7] Dress, A.; Morgenstern, B.; Stoye, J., The number of standard and of effective multiple alignments, Applied Mathematics Letters, 11, 4, 43-49, (1998) · Zbl 0936.92022
[8] Drmota, M., Systems of functional equations, Random Struct. Alg., 10, 1-2, 103-124, (1997) · Zbl 0869.39010
[9] P. Duchon, P. Flajolet, G. Louchard and G. Schaeffer, Boltzmann samplers for the random generation of combinatorial structures, Combinatorics, Probablity, and Computing13(4-5) (2004) 577-625, Special issue on Analysis of Algorithms. · Zbl 1081.65007
[10] P. Flajolet, P. Zimmermann and B. Van Cutsem, Calculus for the random generation of labelled combinatorial structures, Theoretical Computer Science132 (1994) 1-35, A preliminary version is available in INRIA Research Report RR-1830. · Zbl 0799.68143
[11] Flajolet, P.; Fusy, É.; Pivoteau, C., Boltzmann sampling of unlabelled structures, Proceedings of the Meeting on Analytic Algorithmics and Combinatorics, 201-211, (2007) · Zbl 1430.68164
[12] Flajolet, P.; Sedgewick, R., Analytic Combinatorics, (2009), Cambridge University Press, Cambridge · Zbl 1165.05001
[13] Giegerich, R.; Touzet, H., Modeling dynamic programming problems over sequences and trees with inverse coupled rewrite systems, Algorithms, 7, 1, 62-144, (2014) · Zbl 1461.68251
[14] Herrbach, C.; Denise, A.; Dulucq, S., Average complexity of the Jiang-Wang-Zhang pairwise tree alignment algorithm and of a RNA secondary structure alignment algorithm, Theoret. Comput. Sci., 411, 26-28, 2423-2432, (2010) · Zbl 1208.68241
[15] Höchsmann, M.; Töller, T.; Giegerich, R.; Kurtz, S., Local similarity in RNA secondary structures., Proc IEEE Comput Soc Bioinform Conf., 2, 159-168, (2003)
[16] Höchsmann, M.; Voss, B.; Giegerich, R., Pure multiple RNA secondary structure alignments: A progressive profile approach, IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1, 53-62, (2004)
[17] Jiang, T.; Wang, L.; Zhang, K., Alignment of trees — an alternative to tree edit, Theoret. Comput. Sci., 143, 1, 137-148, (1995) · Zbl 0873.68150
[18] Needleman, S.; Wunsch, C., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 443-453, (1970)
[19] Y. Ponty and C. Saule, A combinatorial framework for designing (pseudoknotted) RNA algorithms, Algorithms in Bioinformatics - 11th International Workshop, WABI 2011, Saarbrücken, Germany, September 5-7, 2011. Proceedings, eds. T. M. Przytycka and M. Sagot, Lecture Notes in Computer Science6833 (Springer, 2011), pp. 250-269.
[20] Sauthoff, G.; Janssen, S.; Giegerich, R., Bellman’s gap: A declarative language for dynamic programming, Proceedings of the 13th International ACM SIGPLAN Symposium on Principles and Practices of Declarative Programming, PPDP’11, 29-40, (2011), ACM, New York, NY, USA
[21] Schirmer, S.; Giegerich, R., Forest alignment with affine gaps and anchors, applied in RNA structure comparison, Theoret. Comput. Sci., 483, 51-67, (2013) · Zbl 1292.68184
[22] Torres, A.; Cabada, A.; Nieto, J. J., An exact formula for the number of alignments between two DNA sequences, DNA Seq., 14, 427-430, (2003)
[23] Vingron, M.; Argos, P., Determination of reliable regions in protein sequence alignments, Protein Engineering, 3, 7, 565-569, (1990)
[24] Waterman, M. S., Introduction to Computational Biology: Maps, Sequences, and Genomes, (1995), CRC Press · Zbl 0831.92011
[25] Wilf, H. S., A unified setting for sequencing, ranking, and selection algorithms for combinatorial objects, Advances in Mathematics, 24, 281-291, (1977) · Zbl 0354.05041
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.