×

Characterizing the reconstruction and enumerating the patterns of DNA sequences with repeats. (English) Zbl 1128.92011

Summary: A repeat in a DNA sequence is a substring that appears more than once. In DNA sequencing, the occurrence of repeats may hinder the unique reconstruction. In addition, the number of possible reconstructions depends on the pattern of repeats in a DNA sequence. R. Arratia et al. [Discrete Appl. Math. 104, No. 1–3, 63–96 (2000; Zbl 0997.92014)] studied the patterns of DNA sequences with twofold repeats that result in \(k\)-way reconstructions. In this paper, multiple-fold repeats, including twofold repeats, are considered. For each pattern of DNA repeats, the possible reconstructions of the DNA sequence are enumerated by its reduced digraph. Then the reconstructions of DNA sequences with repeats are characterized using the pattern graphs. Finally, for DNA sequences with \(n\) repeats, the patterns of DNA repeats resulting in \(k\)-way reconstruction are enumerated.

MSC:

92C40 Biochemistry, molecular biology
05C90 Applications of graph theory
05A99 Enumerative combinatorics

Citations:

Zbl 0997.92014

Software:

RepeatAssembler
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arratia R, Bollobás B, Coppersmith D, Sorkin GB (2001) Euler circuits and sequencing by hybridization. Discret Appl Math 104:63–96 · Zbl 0997.92014
[2] Blazewicz J, Kasprak M (2003) Complexity of DNA sequencing by hybridization. Theor Comput Sci 290:1459–1473 · Zbl 1044.68065
[3] Blazewicz J, Hertz A, Kobler D, de Werra D (1999) On some properties of DNA graphs. Discret Appl Math 98:1–19 · Zbl 0939.05078
[4] Blazewicz J, Formanowicz P, Kasprzak M, Schuurman P, Woeginger GJ (2002) DNA sequencing, Eulerian graphs, and the exact perfect matching problem. In: Lecture notes in computer science, vol 2573, pp 13–24 · Zbl 1022.68089
[5] Bollobas B (1998) Modern graph theory. In: Graduate texts in mathematics, vol 184. Springer, New York
[6] Farman ML, Gilkerson JW, Jaromczyk JW, Staben C (2004) RepeatAssembler: a package for annotation of full-length repetitive DNA sequences in fungal genomes. In: Proc. IEEE computational systems bioinformatics conference, pp 464–467
[7] Hubbell E (2001) Multiplex sequencing by hybridization. J Comput Biol 8:141–149
[8] Jiang T, Li M (1996) DNA sequencing and string learning. Math Syst Theory 29:387–405 · Zbl 1111.68458
[9] Kececioglu JD, Myers EW (1995) Combinatorial algorithms for DNA sequence assembly. Algorithmica 13:7–51 · Zbl 0831.92013
[10] Lysov YP, Florentiev VL, Khorlyn AA, Khrapko KR, Shick VV, Mirzabekov AD (1988) Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides–a new method. Dokl Acad Sci USSR 303:1508–1511
[11] Pevzner PA (1989) l-Tuple DNA sequencing: computer analysis. J Biomol Struct Dyn 7:63–73
[12] Pevzner PA (1995) DNA physical mapping and alternating Eulerian circuits in colored graphs. Algorithmica 13:77–105 · Zbl 0840.92011
[13] Pevzner PA (2000) Computational molecular biology: an algorithmic approach. MIT Press, Cambridge · Zbl 0972.92011
[14] Setubal J, Meidanis J (1997) Introduction to computational molecular Biology. PWS
[15] Tucker A (2002) Applied combinatorics, 4th edn. Wiley, New York · Zbl 1238.05001
[16] Ukkonen E (1992) Approximate string-matching with q-grams and maximal matches. Theor Comput Sci 92:191–211 · Zbl 0747.68026
[17] van Lint JH, Wilson RM (2001) A course in combinatorics, 2nd edn. Cambridge University Press, Cambridge · Zbl 0980.05001
[18] West DB (2001) Introduction to graph theory, 2nd edn. Prentice Hall, New York
[19] Zhang Y, Waterman MS (2003) An Eulerian path approach to global multiple alignment for DNA sequences. J Comput Biol 10:803–819
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.