×

A non-local gap-penalty for profile alignment. (English) Zbl 0851.92012

Summary: The length of an alignment of biological sequences is typically longer than the mean length of its component sequences. (This arises from the insertion of gaps in the alignment.) When such an alignment is used as a profile for the alignment of further sequences (or profiles), it will have a bias toward additional sequences that match the length of the profile, rather than the mean length of sequences in the profile, as the alignment of these will entail fewer (or smaller) insertions (so avoiding gap-penalties).
An algorithm is described to correct this bias that entails monitoring the correspondence, for every pair of positions, of the mean separations in both profiles as they are aligned. The correction was incorporated into a standard dynamic programming algorithm through a modification of the gap-penalty, but, unlike other approaches, this modification is not local and takes into consideration the overall alignment of the sequences. This implies that the algorithm cannot guarantee to find the optimal alignment, but tests suggest that close approximations are obtained. The method was tested on protein families by measuring the area in the parameter space of the phase containing the correct multiple alignment. No improvement (increase in phase area) was found with a family that required few gaps to be aligned correctly. However, for highly gapped alignments, a 50% increase in area was obtained with one family and the correct alignment was found for another that could not be aligned with the unbiased method.

MSC:

92D20 Protein sequences, DNA sequences
92C40 Biochemistry, molecular biology
90C90 Applications of mathematical programming

Software:

CLUSTAL; ClustalW
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Barton, G. J. and M. J. E. Sternberg, 1987a. Evaluation and improvements in the automatic alignment of protein sequences.Protein Eng. 1, 89–94. · doi:10.1093/protein/1.2.89
[2] Barton, G. J. and M. J. E. Sternberg, 1987b. A strategy for the rapid multiple alignment of protein sequences.J. Mol. Biol. 198, 327–337. · doi:10.1016/0022-2836(87)90316-0
[3] Dayhoff, M. O., R. M. Schwartz and B. C. Orcutt, 1978. A model of evolutionary change in proteins.In Atlas of Protein Sequence and Structure, M. O. Dayhoff (Ed), Vol. 5, Suppl. 3, pp. 345–352. Washington DC: Nat. Biomed. Res. Foundation.
[4] Doolittle, R. F., D. F. Feng, M. S. Johnson and M. A. McClure, 1989. Origins and evolutionary relationships of retroviruses.Quart. Rev. Biol. 64, 1–30. · doi:10.1111/j.1469-185X.1989.tb00635.x
[5] Feng, D. F. and R. F. Doolittle, 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees.J. Mol. Evol. 25, 351–360. · doi:10.1007/BF02603120
[6] Gribskov, M., A. D. McLachlan and D. Eisenberg, 1987. Profile analysis: detection of distantly related proteins.Proc. Natl. Acad. Sci. U.S.A. 84, 4355–4358. · doi:10.1073/pnas.84.13.4355
[7] Higgins, D. G. and P. M. Sharp, 1988. Clustal: a package for performing multiple sequence alignment on a microcomputer.Gene 73, 237–244. · doi:10.1016/0378-1119(88)90330-7
[8] Jones, D. T., W. R. Taylor and J. M. Thornton, 1992. A new approach to protein fold recognition.Nature 358, 86–89. · doi:10.1038/358086a0
[9] Lathrop, R. H., 1994. The protein threading problem with sequence amino acid interaction preferences is NP-complete.Protein Eng. 7, 1059–1068. · doi:10.1093/protein/7.9.1059
[10] Lesk, A., M. Levitt and C. Chothia, 1986. Alignment of the amino acid sequences of distantly related proteins using variable gap penalties.Protein Eng. 1, 77–78. · doi:10.1093/protein/1.1.77
[11] McClure, M. A., T. K. Vasi and W. M. Fitch, 1994. Comparative analysis of multiple protein-sequence alignment methods.Mol. Biol. Evol. 11 571–592.
[12] Musacchio, A., T. J. Gibson, V. P. Lehto and M. Saraste, 1992. SH3–an abundant protein domain in search of a function.FEBS Lett. 307, 55–61. · doi:10.1016/0014-5793(92)80901-R
[13] Needleman, S. B. and C. D. Wunsch, 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. Mol. Biol. 48, 443–453. · doi:10.1016/0022-2836(70)90057-4
[14] Taylor, W. R. and C. A. Orengo, 1989. A protein structure alignment.J. Mol. Biol. 208, 1–22. · doi:10.1016/0022-2836(89)90084-3
[15] Taylor, W. R., 1988. A flexible method to align large numbers of biological sequences.J. Mol. Evol. 28, 161–169. · doi:10.1007/BF02143508
[16] Taylor, W. R., 1989. A template based method of pattern matching in protein sequences.Prog. Biophys. Mol. Biol. 54, 159–252. · doi:10.1016/0079-6107(89)90011-4
[17] Taylor, W. R. 1994. Motif-biased protein sequence alignment.J. Comp. Biol. 1.
[18] Thompson, J. D., D. G. Higgins and T. J. Gibson, 1994a. Clustal-W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Res. 22, 4673–4680. · doi:10.1093/nar/22.22.4673
[19] Thompson, J. D., D. G. Higgins and T. J. Gibson, 1994b. Improved sensitivity of profile searches through the use of sequence weights and gap excision.CABIOS 10, 19–29.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.