×

A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding. (English) Zbl 1400.92398

Summary: A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

MSC:

92D20 Protein sequences, DNA sequences
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Hamori, E.; Ruskin, J., H-curves, a novel method of representation of nucleotide series especially suited for long DNA-sequences, J. Biol. Chem., 258, 1318-1327 (1983)
[2] He, P. A.; Wei, J.; Yao, Y.; Tie, Z., A novel graphical representation of proteins and its application, Physica A, 391, 93-99 (2012)
[3] He, P. A.; Li, D.; Zhang, Y.; Wang, X.; Yao, Y., A 3D graphical representation of protein sequences based on the Gray code, J. Theoret. Biol., 304, 81-87 (2012) · Zbl 1397.92528
[4] Iranmanesh, A.; Nejati, R., A new algorithm for the graph coloring by real-time PCR, J. Comput. Theor. Nanosci., 10, 2487 (2013)
[5] Li, C.; Ma, H.; Zhou, Y.; Wang, X. L.; Zheng, X. Q., Similarity analysis of DNA sequences based on the weighted pseudo-entropy, J. Comput. Chem., 32, 675-680 (2011)
[6] Dai, Q.; Yan, Z. F.; Shi, Z. X.; Liu, X. Q.; Yao, Y. H.; He, P. G., Study of LZ-word distribution and its application for sequence comparison, J. Theoret. Biol., 336, 52-60 (2013) · Zbl 1411.92231
[7] Wąż, P.; Bieliska-Wąż, D., 3D-dynamic representation of DNA sequences, J. Mol. Model., 20, 2141 (2014)
[8] Wąż, P.; Bieliska-Wąż, D., Non-standard similarity/dissimilarity analysis of DNA sequences, Genomics, 104, 464-471 (2014)
[9] Stan, C.; Cristescu, C. P.; Scarlat, E. I., Similarity analysis for DNA sequences based on chaos game representation case study: The albumin, J. Theoret. Biol., 267, 513-518 (2010) · Zbl 1414.92203
[10] Qi, X. Q.; Wen, J.; Qi, Z. H., New 3D graphical representation of DNA sequence based on dual nucleotides, J. Theoret. Biol., 249, 681-690 (2007) · Zbl 1453.92233
[11] Nandy, A., A new graphical representation and analysis of DNA-sequence structure: I. Methodology and application to globin genes, Current Sci., 66, 309-314 (1994)
[12] Yao, Y. H.; Nan, X. Y.; Wang, T. M., A new 2D graphical representation—Classification curve and the analysis of similarity/dissimilarity of DNA sequences, J. Mol. Struct.: THEOCHEM, 764, 101-108 (2006)
[13] Liao, B.; Xiang, Q. L.; Cai, L. J.; Cao, Z., A new graphical coding of DNA sequence and its similarity calculation, Physica A, 392, 4663-4667 (2013) · Zbl 1395.92105
[14] Gonzalez, D. L.; Giannerini, S.; Rosa, R., The non-power model of the genetic code: a paradigm for interpreting genomic information, Phil. Trans. R. Soc. A, 374, 2063, 20150062 (2016) · Zbl 1404.92137
[15] Gonzalez, D. L.; Giannerini, S.; Rosa, R., Detecting structure in parity binary sequences, IEEE Eng. Med. Biol. Mag., 25, 1, 69-81 (2006)
[16] Gonzalez, D. L., Can the genetic code be mathematically described?, Med. Sci. Monit. Int. Med. J. Exp. Clin. Res., 10, 4, 11-17 (2004)
[17] Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E., Fractal landscape analysis of DNA walks, Physica A, 191, 25-29 (1992)
[18] Buldyrev, S. V.; Dokholyan, N. V.; Goldberger, A. L., Analysis of DNA sequences using methods of statistical physics, Physica A, 249, 430-438 (1998)
[19] Arques, D. G.; Michel, C. J., A complementary circular code in the protein coding genes, J. Theoret. Biol., 182, 45-58 (1996)
[20] Hou, W. B.; Pan, Q. H.; He, M. F., A novel representation of DNA sequence based on CMI coding, Physica A, 409, 87-96 (2014) · Zbl 1395.92102
[21] Yin, C. C.; Chen, Y.; Yau, S. S.T., A measure of DNA sequence similarity by Fourier transform with applications on hierarchical clustering, J. Theoret. Biol., 359, 18-28 (2014) · Zbl 1412.92252
[22] Jeong, B. S.; Bari, A. T.M. G.; Reaz, M. R.; Jeona, S.; Lima, C. G.; Choi, H. J., Codon-based encoding for DNA sequence analysis, Methods, 67, 373-379 (2014)
[23] Zhang, J. H.; Wang, R. H.; Bai, F. L.; Zheng, J. S., A quasi-MQ EMD method for similarity analysis of DNA sequences, Appl. Math. Lett., 24, 2052-2058 (2011) · Zbl 1229.92036
[24] Bai, F. L.; Zhang, J. H.; Zheng, J. S., Similarity analysis of DNA sequences based on the EMD method, Appl. Math. Lett., 24, 232-237 (2011) · Zbl 1201.92029
[25] Gunasinghea, U.; Alahakoon, D.; Bedingfield, S., Extraction of high quality k-words for alignment-free sequence comparison, J. Theoret. Biol., 358, 31-51 (2014) · Zbl 1412.92216
[26] Bai, F. L.; Liu, Y. Z., A representation of DNA primary sequences by random walk, Math. Biosci., 209, 282-291 (2007) · Zbl 1120.92018
[27] Li, C.; Yu, X. Q.; Helal, N. D., Similarity analysis of DNA sequences based on codon usage, Chem. Phys. Lett., 459, 172-174 (2008)
[28] Subashini, M. M.; Sahoo, S. K., Pulse coupled neural networks and its applications, Expert Syst. Appl., 41, 3965-3974 (2014)
[29] Duan, X. H.; Cao, J. J.; Liu, J., Application research of modified PCNN model in multispectral and panchromatic images fusion, Mod. Electron. Tech., 37, 3, 55-60 (2014), (in Chinese)
[30] Fu, J. C.; Chen, C. C.; Chai, J. W.; Wong, S. T.C.; Li, I. C., Image segmentation by EM-based adaptive pulse coupled neural networks in brain magnetic resonance imaging, Comput. Med. Imaging Graph., 34, 308-320 (2010)
[32] Samir Elons, A.; Abull-ela, Magdy; Tolba, M. F., A proposed PCNN features quality optimization technique for pose-invariant 3D Arabic sign language recognition, Appl. Soft Comput., 13, 1646-1660 (2013)
[33] Li, H. H.; Jin, X.; Yang, N.; Yang, Z., The recognition of landed aircrafts based on PCNN model and affine moment invariants, Pattern Recognit. Lett., 51, 23-29 (2015)
[34] Hong, Q.; Zhang, Y., A new algorithm for finding the shortest paths using PCNNs, Chaos Solitons Fractals, 33, 4, 1220-1229 (2007) · Zbl 1137.90698
[35] Li, X. J.; Ma, Y. D.; Feng, X. W., Self-adaptive autowave pulse-coupled neural network for shortest-path problem, Neurocomputing, 115, 63-71 (2013)
[36] Wang, Y. N.; Ge, J.; Zhang, H.; Zhou, B. W., Intelligent injection liquid particle inspection machine based on two-dimensional Tsallis Entropy with modified pulse-coupled neural networks, Eng. Appl. Artif. Intell., 24, 625-637 (2011)
[37] Wang, Zh. B.; Ma, Y. D.; Cheng, F. Y.; Yang, L. Z., Review of pulse-coupled neural networks, Image Vis. Comput., 28, 5-13 (2010)
[38] Syed, U. A.; Kunwar, F.; Iqbal, M., Guided autowave pulse coupled neural network (GAPCNN) based real time path planning and an obstacle avoidance scheme for mobile robots, Robot. Auton. Syst., 62, 474-486 (2014)
[39] Zhao, C. H.; Shao, G. F.; Ma, L. J., Image fusion algorithm based on redundant-lifting NSWMDA and adaptive PCNN, Optik, 125, 6247-6255 (2014)
[40] Zhou, Dongming; Nie, Rencan; Zhao, Dongfeng, Analysis of autowave characteristics for competitive pulse coupled neural network and its application, Neurocomputing, 72, 2331-2336 (2009)
[41] Nie, Rencan; Zhou, Dongming; He, Min; Jin, Xin; Yu, Jiefu, Facial feature extraction using frequency map series in PCNN, J. Sensors, 2016, 1-9 (2016), Article ID 5491341
[42] Jin, Xin; Nie, Rencan; Zhou, Dongming; Wang, Quan; He, Kangjian, Multifocus color image fusion based on NSST and PCNN, J. Sensors, 2016, 1-12 (2016), Article ID 8359602
[43] Huffman, D. A., A method for the construction of minimum-redundancy codes, Proc. IRE, 40, 1098-1101 (1952) · Zbl 0137.13605
[44] Golin, M.; Mathieu, C.; Young, N. E., Huffman coding with letter costs: a linear-time approximation scheme, SIAM J. Comput., 41, 3, 684-713 (2012) · Zbl 1248.94045
[45] Wu, J. Z.; Wang, Y. J.; Ding, L. P.; Liao, X. F., Improving performance of network covert timing channel through Huffman coding, Math. Comput. Modelling, 55, 69-79 (2012) · Zbl 1245.94092
[46] Liao, B.; Zhang, Y. S.; Ding, K. Q.; Wang, T. M., Analysis of similarity/dissimilarity of DNA sequences base on a condensed curve representation, J. Mol. Struct.: THEOCHEM, 717, 199-203 (2005)
[47] Yang, X.; Wang, T., Linear regression model of short \(k\)-word: a similarity distance suitable for biological sequences with various lengths, J. Theoret. Biol., 337, 61-70 (2013) · Zbl 1411.92239
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.