×

DUC-curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment. (English) Zbl 1400.92401

Summary: A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of \(\beta\)-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

MSC:

92D20 Protein sequences, DNA sequences
68W32 Algorithms on strings
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Randić, M.; Noviě, M.; Plavšić, D., Milestones in graphical bioinformatics, Int. J. Quantum chem., 113, 2413-2446 (2013)
[2] Gates, M. A., Simpler DNA sequence representations, Nature, 316, 219 (1985)
[3] Mizraji, E.; Ninio, J., Graphical coding of nucleic acid sequences, Biochimie, 67, 445-448 (1985)
[4] Nandy, A., Two-dimensional graphical representation of DNA sequences and intron-exon discrimination in intron-rich sequences, Comput. Appl. Biosci., 12, 55-62 (1996)
[5] Lobry, J. R., A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria, Biochimie, 78, 323-326 (1996)
[6] Guo, X.; Randić, M.; Basak, S. C., A novel 2-D graphical representation of DNA sequences of low degeneracy, Chem. Phys. Lett., 350, 106-112 (2001)
[7] Liu, Y.; Guo, X.; Xu, J.; Pan, L.; Wang, S., Some notes on 2-D graphical representation of DNA sequence, J. Chem. Inf. Comput. Sci., 42, 529-533 (2002)
[8] Bielińska-Wa̧ż, D.; Clark, T.; Wa̧ż, P.; Nowak, W.; Nandy, A., 2D-dynamic representation of DNA sequences, Chem. Phys. Lett., 442, 140-144 (2007)
[9] Huang, G.; Liao, B.; Li, Y.; Yu, Y., Similarity studies of DNA sequences based on a new 2D graphical representation, Biophys. Chem., 143, 55-59 (2009)
[10] Jeffrey, H. J., Chaos game representation of gene structure, Nucleic Acids Res., 18, 2163-2170 (1990)
[11] Randić, M., Another look at the chaos-game representation of DNA, Chem. Phys. Lett., 456, 84-88 (2008)
[12] Stan, C.; Cristescu, C. P.; Scarlat, E. I., Similarity analysis for DNA sequences based on chaos game representation. Case study: The albumin, J. Theoret. Biol., 267, 513-518 (2010) · Zbl 1414.92203
[13] Pal, M.; Satish, B.; Srinivas, K.; Rao, P. M.; Manimaran, P., Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation, Physica A, 436, 596-603 (2015) · Zbl 1400.92408
[14] Randić, M.; Vračko, M.; Lerš, N.; Plavšić, D., Novel 2-D graphical representation of DNA sequences and their numerical characterization, Chem. Phys. Lett., 368, 1-6 (2003)
[15] Li, C.; Wang, J., Numerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences, Comb. Chem. High Throughput Screen., 6, 795-799 (2003)
[16] Liao, B.; Wang, T. M., New 2D graphical representation of DNA sequences, J. Comput. Chem., 25, 1364-1368 (2004)
[17] Song, J.; Tang, H., A new 2-D graphical representation of DNA sequences and their numerical characterization, J. Biochem. Biophys. Methods, 63, 228-239 (2005)
[18] Zhang, Z. J., DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences, Bioinformatics, 25, 1112-1117 (2009)
[19] Bielińska-Wa̧ż, D., Four-component spectral representation of DNA sequences, J. Math. Chem., 47, 41-51 (2010) · Zbl 1194.92024
[20] Randić, M.; Vraěko, M.; Zupan, J.; Noviě, M., Compact 2-D graphical representation of DNA, Chem. Phys. Lett., 373, 558-562 (2003)
[21] Randić, M., Graphical representations of DNA as 2-D map, Chem. Phys. Lett., 386, 468-471 (2004)
[22] Randić, M.; Lerš, N.; Plavšić, D.; Basak, S. C.; Balaban, A. T., Four-color map representation of DNA or RNA sequences and their numerical characterization, Chem. Phys. Lett., 407, 205-208 (2005)
[23] Dai, Q.; Liu, X.; Wang, T., A novel 2D graphical representation of DNA sequences and its application, J. Mol. Graph. Model., 25, 340-344 (2006)
[24] Li, Y.; Qin, Y.; Zheng, X.; Zhang, Y., Three-unit semicircles curve: A compact 3D graphical representation of DNA sequences based on classifications of nucleotides, Int. J. Quantum Chem., 112, 2330-2335 (2012)
[25] Wang, J.; Zhang, Y., Characterization and similarity analysis of DNA sequences grounded on a 2-D graphical representation, Chem. Phys. Lett., 423, 50-53 (2006)
[26] Xie, G.; Mo, Z., Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications, J. Theoret. Biol., 269, 123-130 (2011) · Zbl 1307.92311
[27] Qi, Z.; Qi, X., Novel 2D graphical representation of DNA sequence based on dual nucleotides, Chem. Phys. Lett., 440, 139-144 (2007)
[28] Qi, X.-Q.; Wen, J.; Qi, Z.-H., New 3D graphical representation of DNA sequence based on dual nucleotides, J. Theoret. Biol., 249, 681-690 (2007) · Zbl 1453.92233
[29] Liu, Z.; Liao, B.; Zhu, W.; Huang, G., A 2D graphical representation of DNA sequence based on dual nucleotides and its application, Int. J. Quantum Chem., 109, 948-958 (2009)
[30] Yu, J.-F.; Sun, X.; Wang, J.-H., TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications, J. Theoret. Biol., 261, 459-468 (2009) · Zbl 1403.92226
[31] Jafarzadeh, N.; Iranmanesh, A., C-curve: A novel 3D graphical representation of DNA sequence based on codons, Math. Biosci., 241, 217-224 (2013) · Zbl 1316.92057
[32] Zou, S.; Wang, L.; Wang, J., A 2D graphical representation of the sequences of DNA based on triplets and its application, J. Bioinf. Syst. Biol., 2014, 1-7 (2014)
[33] Bai, F.; Zhang, J.; Zheng, J.; Li, C.; Liu, L., Vector representation and its application of DNA sequences based on nucleotide triplet codons, J. Mol. Graph. Model., 62, 150-156 (2015)
[34] He, P.-a.; Li, D.; Zhang, Y.; Wang, X.; Yao, Y., A 3D graphical representation of protein sequences based on the Gray code, J. Theoret. Biol., 304, 81-87 (2012) · Zbl 1397.92528
[35] Hou, W.; Pan, Q.; He, M., A novel representation of DNA sequence based on CMI coding, Physica A, 409, 87-96 (2014) · Zbl 1395.92102
[36] Roy, A.; Raychaudhury, C.; Nandy, A., Novel techniques of graphical representation and analysis of DNA sequences—A review, J. Biosci., 23, 55-71 (1998)
[37] Bielińska-Wa̧ż, D., Graphical and numerical representations of DNA sequences: statistical aspects of similarity, J. Math. Chem., 49, 2345-2407 (2011) · Zbl 1303.92087
[38] Randić, M.; Zupan, J.; Balaban, A. T.; Vikić-Topić, D.; Plavšić, D., Graphical representation of proteins, Chem. Rev., 111, 790-862 (2011)
[39] Li, Y.; Liu, Q.; Zheng, X.; He, P.-a., UC-Curve: A highly compact 2D graphical representation of protein sequences, Int. J. Quantum Chem., 114, 409-415 (2014)
[40] Randić, M.; Zupan, J.; Vikić-Topić, D.; Plavšić, D., A novel unexpected use of a graphical representation of DNA: Graphical alignment of DNA sequences, Chem. Phys. Lett., 431, 375-379 (2006)
[41] Qi, Z.-H.; Qi, X.-Q.; Liu, C.-C., New method for global alignment of 2 DNA sequences by the tree data structure, J. Theoret. Biol., 263, 227-236 (2010) · Zbl 1406.92469
[42] Alba, M. M.; Guigo, R., Comparative analysis of amino acid repeats in rodents and humans, Genome Res., 14, 549-554 (2004)
[43] Selvamani, M. J.; Degnan, S. M.; Degnan, B. M., Microsatellite genotyping of individual abalone larvae: parentage assignment in aquaculture, Mar. Biotechnol., 3, 478-485 (2001)
[45] Sakurai, K.; Horiuchi, Y.; Ikeda, H.; Ikezaki, K.; Yoshimoto, T.; Fukui, M.; Arinami, T., A novel susceptibility locus for moyamoya disease on chromosome 8q23, J. Hum. Genet., 49, 278-281 (2004)
[46] Staten, R.; Schully, S. D.; Noor, M. A., A microsatellite linkage map of Drosophila mojavensis, BMC Genet., 5, 12 (2004)
[47] Guo, Y.; Wang, T.-M., A new method to analyze the similarity of the DNA sequences, J. Mol. Struct. THEOCHEM, 853, 62-67 (2008)
[48] Qi, Z. H.; Li, L.; Qi, X. Q., Using Huffman coding method to visualize and analyze DNA sequences, J. Comput. Chem., 32, 3233-3240 (2011)
[49] Bari, A. T.; Reaz, M. R.; Islam, A. K.; Choi, H. J.; Jeong, B. S., Effective encoding for DNA sequence visualization based on nucleotide’s ring structure, Evol. Bioinform. Online, 9, 251-261 (2013)
[51] Liu, Y.-z.; Wang, T.-m., Related matrices of DNA primary sequences based on triplets of nucleic acid bases, Chem. Phys. Lett., 417, 173-178 (2006)
[52] Cao, Z.; Li, R.; Chen, W., A 3D graphical representation of DNA sequence based on numerical coding method, Int. J. Quantum Chem., 110, 975-980 (2010)
[53] Zhao, L. P.; Lv, Y. H.; Li, C.; Yao, M. H.; Jin, X. Z., An S-curve-based approach of identifying biological sequences, Acta Biotheor., 58, 1-14 (2010)
[54] Liao, B.; Xiang, Q.; Cai, L.; Cao, Z., A new graphical coding of DNA sequence and its similarity calculation, Physica A, 392, 4663-4667 (2013) · Zbl 1395.92105
[55] Waddell, P. J.; Okada, N.; Hasegawa, M., Towards resolving the interordinal relationships of placental mammals, Syst. Biol., 48, 1-5 (1999)
[56] Madsen, O.; Scally, M.; Douady, C. J.; Kao, D. J.; DeBry, R. W.; Adkins, R.; Amrine, H. M.; Stanhope, M. J.; de Jong, W. W.; Springer, M. S., Parallel adaptive radiations in two major clades of placental mammals, Nature, 409, 610-614 (2001)
[57] Murphy, W. J.; Eizirik, E.; Johnson, W. E.; Zhang, Y. P.; Ryder, O. A.; O’Brien, S. J., Molecular phylogenetics and the origins of placental mammals, Nature, 409, 614-618 (2001)
[58] Kullberg, M.; Nilsson, M. A.; Arnason, U.; Harley, E. H.; Janke, A., Housekeeping genes for phylogenetic analysis of eutherian relationships, Mol. Biol. Evol., 23, 1493-1503 (2006)
[59] Nishihara, H.; Hasegawa, M.; Okada, N., Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions, Proc. Natl. Acad. Sci. USA, 103, 9929-9934 (2006)
[60] Waters, P. D.; Dobigny, G.; Waddell, P. J.; Robinson, T. J., Evolutionary history of LINE-1 in the major clades of placental mammals, PLoS One, 2, e158 (2007)
[61] Hou, Z. C.; Romero, R.; Wildman, D. E., Phylogeny of the Ferungulata (Mammalia: Laurasiatheria) as determined from phylogenomic data, Mol. Phylogenet. Evol., 52, 660-664 (2009)
[62] Zhou, X.; Xu, S.; Xu, J.; Chen, B.; Zhou, K.; Yang, G., Phylogenomic analysis resolves the interordinal relationships and rapid diversification of the laurasiatherian mammals, Syst. Biol., 61, 150-164 (2012)
[63] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402 (1997)
[64] Lipman, D. J.; Pearson, W. R., Rapid and sensitive protein similarity searches, Science, 227, 1435-1441 (1985)
[65] Pearson, W. R.; Lipman, D. J., Improved tools for biological sequence comparison, Proc. Natl. Acad. Sci. USA, 85, 2444-2448 (1988)
[66] Randić, M., Very efficient search for nucleotide alignments, J. Comput. Chem., 34, 77-82 (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.