×

zbMATH — the first resource for mathematics

An Eulerian path approach to DNA fragment assembly. (English) Zbl 0993.92018
Summary: For the last 20 years, fragment assembly in DNA sequencing followed the ‘overlap-layout-consensus’ paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical ‘overlap-layout-consensus’ approach in favor of a new EULER algorithm that, for the first time, resolves the 20-year-old ‘repeat problem’ in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. EULER, in contrast to the CELERA assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.

MSC:
92C40 Biochemistry, molecular biology
05C45 Eulerian and Hamiltonian graphs
92D20 Protein sequences, DNA sequences
05C90 Applications of graph theory
Software:
CAP3; EULER
PDF BibTeX XML Cite
Full Text: DOI Link
References:
[1] Bonfield, Nucleic Acids Research 23 (24) pp 4992– (1995) · doi:10.1093/nar/23.24.4992
[2] 1 pp 9– (1995) · doi:10.1089/gst.1995.1.9
[3] ALGORITHMICA 13 pp 7– (1995) · Zbl 0831.92013 · doi:10.1007/BF01188580
[4] Myers, Journal of computational biology : a journal of computational molecular cell biology 2 (2) pp 275– (1995) · doi:10.1089/cmb.1995.2.275
[5] Huang, Genome Research 9 (9) pp 868– (1999) · doi:10.1101/gr.9.9.868
[6] Gordon, Genome Research 8 (3) pp 195– (1998) · doi:10.1101/gr.8.3.195
[7] Drmanac, Genomics 4 (2) pp 114– (1989) · doi:10.1016/0888-7543(89)90290-5
[8] Lysov, Doklady Akademii Nauk. Rossiyskaya Akademiya Nauk 303 (6) pp 1508– (1988)
[9] Pevzner, Journal of biomolecular structure & dynamics 7 (1) pp 63– (1989)
[10] Idury, Journal of computational biology : a journal of computational molecular cell biology 2 (2) pp 291– (1995) · doi:10.1089/cmb.1995.2.291
[11] Parkhill, Nature; Physical Science (London) 403 (6770) pp 665– (2000) · doi:10.1038/35001088
[12] Parkhill, Nature; Physical Science (London) 404 (6777) pp 502– (2000) · doi:10.1038/35006655
[13] Bolotin, Antonie van Leeuwenhoek 76 (1-4) pp 27– (1999) · doi:10.1023/A:1002048720611
[14] Roach, Genomics 26 (2) pp 345– (1995) · doi:10.1016/0888-7543(95)80219-C
[15] Weber, Genome Research 7 (5) pp 401– (1997) · doi:10.1101/gr.7.5.401
[16] Fleischmann, Science 269 (5223) pp 496– (1995) · doi:10.1126/science.7542800
[17] Myers, Science 287 (5461) pp 2196– (2000) · doi:10.1126/science.287.5461.2196
[18] Churchill, Genomics 14 (1) pp 89– (1992) · doi:10.1016/S0888-7543(05)80288-5
[19] Lander, Nature; Physical Science (London) 409 (6822) pp 860– (2001) · doi:10.1038/35057062
[20] Venter, Science 291 (5507) pp 1304– (2001) · doi:10.1126/science.1058040
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.