×

Parametric analysis of alignment and phylogenetic uncertainty. (English) Zbl 1214.92057

Summary: To infer a phylogenetic tree from a set of DNA sequences, typically a multiple alignment is first used to obtain homologous bases. The inferred phylogeny can be very sensitive to how the alignment was created. We develop tools for analyzing the robustness of phylogeny to perturbations in alignment parameters in the S. B. Needleman and C. D. Wunsch (NW) algorithm [J. Mol. Biol. 48, No. 3, 443–453 (1970)]. Our main tool is parametric alignment, with novel improvements that are of general interest in parametric inference. Using parametric alignment and a Gaussian distribution on alignment parameters, we derive probabilities of optimal alignment summaries and inferred phylogenies. We apply our method to analyze intronic sequences from \(Drosophila\) flies. We show that phylogeny estimates can be sensitive to the choice of alignment parameters, and that parametric alignment elucidates the relationship between alignment parameters and reconstructed trees.

MSC:

92D15 Problems related to evolution
92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bárány, I., & Larman, D. G. (1998). The convex hull of the integer points in a large ball. Math. Ann., 312, 167–181. · Zbl 0927.52020 · doi:10.1007/s002080050217
[2] Beerenwinkel, N., Pachter, L., Sturmfels, B., Elena, S., & Lenski, R. (2007). Analysis of epistatic interactions and fitness landscapes using a new geometric approach. BMC Evol. Biol., 7(1), 60. · doi:10.1186/1471-2148-7-60
[3] Carrillo, H., & Lipman, D. (1988). The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48(5), 1073–1082. · Zbl 0652.92001 · doi:10.1137/0148063
[4] Chiarmonte, F., Yap, V. B., & Miller, W. (2002). Scoring pairwise genomic sequence alignments. Pacific Symp Biocomput, (7), 115–126.
[5] Daskalakis, C., & Roch, S. (2010). Alignment-free phylogenetic reconstruction. In Proceedings of RECOMB 2010. To appear. · Zbl 1377.92060
[6] Dewey, C. N., Huggins, P. M., Woods, K., Sturmfels, B., & Pachter, L. (2006). Parametric alignment of Drosophila genomes. PLoS Comput. Biol., 2(6), e73. · doi:10.1371/journal.pcbi.0020073
[7] Dobkin, D., Edelsbrunner, H., & Yap, C. K. (1990). Probing convex polytopes. In Cox & Wilfong (Eds.), Autonomous robot vehicles (pp. 326–341). New York: Springer.
[8] Edelsbrunner, H. (1987). Algorithms in combinatorial geometry. New York: Springer. · Zbl 0634.52001
[9] Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39(4), 783–791. · doi:10.2307/2408678
[10] Fernández-Baca, D., Seppalainen, T., & Slutzi, G. (2004). Parametric multiple sequence alignment and phylogeny construction. J. Discrete Algorithms, 2(2), 271–287. · Zbl 1115.92045 · doi:10.1016/S1570-8667(03)00078-9
[11] Fernández-Baca, D., & Venkatachalam, B. (2006). Parametric sequence alignment. In S. Aluru (Ed.), Handbook of computational molecular biology. New York: Chapman & Hall.
[12] Gawrilow, E., & Joswig, M. (2000). Polymake: an approach to modular software design in computational geometry. In G. Kalai & G. M. Ziegler (Eds.), Proceedings of the 17th annual symposium on computational geometry (pp. 43–74). Basel: Birkhäuser. · Zbl 0960.68182
[13] Gusfield, D., Balasubramanian, K., & Naor, D. (1994). Parametric optimization and sequence alignment. Algorithmica, 12, 312–326. · Zbl 0802.92016 · doi:10.1007/BF01185430
[14] Gusfield, D., & Stelling, P. (1996). Parametric and inverse-parametric sequence alignment with XPARAL. Methods Enzymol., 266, 481–494. · doi:10.1016/S0076-6879(96)66030-3
[15] Guyon, F., Brochier-Armanet, C., & Guénoche, A. (2009). Comparison of alignment free string distances for complete genome phylogeny. Adv. Data Anal. Classif., 3, 95–108. · Zbl 1284.92073 · doi:10.1007/s11634-009-0041-z
[16] Hein, J. J. (1990). A unified approach to phylogenies and alignments. Methods Enzymol., 183, 625–644.
[17] Higgins, D., Thompson, J., Gibson, T., & Thompson, J. D. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. · doi:10.1093/nar/22.22.4673
[18] Huggins, P. (2006). iB4e: A software framework for parametrizing specialized LP problems. In A. Iglesias & N. Takayama (Eds.), Proceedings of ICMS 2006 (pp. 245–247). New York: Springer. · Zbl 1230.90130
[19] Huggins, P. (2008). Polytopes in computational biology. PhD dissertation, University of California, Berkeley.
[20] Huggins, P., Pachter, L., & Sturmfels, B. (2007). Towards the Human Genotope. Bull. Math. Biol., 69(8), 2723–2725. · Zbl 1245.92019 · doi:10.1007/s11538-007-9244-7
[21] Konagurthu, A. S., & Stuckey, P. J. (2006). Optimal sum-of-pairs multiple sequence alignment using incremental Carrillo-and-Lipman bounds. J. Bioinform. Comput. Biol., 13(3), 668–685.
[22] Liu, K., Raghavan, S., Nelesen, S., Linder, C. R., & Warnow, T. (2009). Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science, 324(5934), 561–1564. · doi:10.1126/science.1171243
[23] Lunter, G., Miklos, I., Drummond, A., Jensen, J. L., & Hein, J. (2005). Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinform., 6, 83. · doi:10.1186/1471-2105-6-83
[24] Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48(3), 443–453. · doi:10.1016/0022-2836(70)90057-4
[25] Pachter, L., & Sturmfels, B. (2004). Parametric inference for biological sequence analysis. Proc. Natl. Acad. Sci. USA, 101(46), 16138–16143. · Zbl 1075.62101 · doi:10.1073/pnas.0406011101
[26] Pachter, L., & Sturmfels, B. (Eds.) (2005). Algebraic statistics for computational biology. Cambridge: Cambridge University Press. · Zbl 1108.62118
[27] Pollard, D. A., Moses, A. M., Iyer, V. N., & Eisen, M. B. (2006). Widespread discordance of gene trees with species trees in Drosophila: evidence for incomplete lineage sorting. PLoS Genetics, 2(10), e173. · doi:10.1371/journal.pgen.0020173
[28] Redelings, B. D., & Suchard, M. A. (2005). Joint Bayesian estimation of alignment and phylogeny. Syst. Biol., 54(3), 401–418. · doi:10.1080/10635150590947041
[29] Sankoff, D. (1975). Minimal mutation trees of sequences. SIAM J. Appl. Math., 78, 35–42. · Zbl 0315.05101 · doi:10.1137/0128004
[30] Sankoff, D., Cedergren, R. J., & Lapalme, G. (1976). Frequency of insertion–deletion, transversion, and transition in the evolution of 5S ribosomal RNA. J. Mol. Evol., 7, 133–149. · doi:10.1007/BF01732471
[31] States, D. J., Gish, W., & Altschul, S. F. (1991). Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods Enzymol., 3(1), 66–70. · doi:10.1016/S1046-2023(05)80165-3
[32] Suchard, M. A., & Redelings, B. D. (2006). Bali-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics, 22(16), 2047–2048. · doi:10.1093/bioinformatics/btl175
[33] Swafford, D. (2007). Paup*. http://paup.csit.fsu.edu/ .
[34] Vinzant, C. (2009). Lower bounds for optimal alignments of binary sequences. Discrete Appl. Math., 157(15), 3341–3346. · Zbl 1226.92026 · doi:10.1016/j.dam.2009.06.028
[35] Waterman, M. S., Eggert, M., & Lander, E. (1992). Parametric sequence comparisons. Proc. Natl. Acad. Sci. USA, 89(13), 6090–6093. · doi:10.1073/pnas.89.13.6090
[36] Vinga, S., & Almeida, J. (2003). Alignment-free sequence comparison–a review. Bioinformatics, 19(4), 513–523. · doi:10.1093/bioinformatics/btg005
[37] Wheeler, W. C. (1995). Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol., 44(3), 321–331.
[38] Ziegler, G. M. (1995). Lectures on polytopes. New York: Springer. · Zbl 0823.52002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.