×

Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou’s PseAAC. (English) Zbl 1406.92452

Summary: In this study, I introduce novel global and local 0D-protein descriptors based on a statistical quantity named ‘total sum of squares’ (TSS). This quantity represents the sum of the squares differences of amino acid properties from the arithmetic mean property. As an extension, the ‘amino acid-types’ and ‘amino acid-groups’ formalisms are used for describing zones of interest in proteins. To assess the effectiveness of the proposed descriptors, a Nearest Neighbor model for predicting the major four protein structural classes was built. This model has a success rate of 98.53% on the jackknife cross-validation test; this performance being superior to other reported methods despite the simplicity of the predictor. Additionally, this predictor has an average success rate of 98.35% in different cross-validation tests performed. A value of 0.98 for the Kappa statistic clearly discriminates this model from a random predictor. The results obtained by the nearest neighbor model demonstrated the ability of the proposed descriptors not only to reflect relevant biochemical information related to the structural classes of proteins but also to allow appropriate interpretability. It can thus be expected that the current method may play a supplementary role to other existing approaches for protein structural class prediction and other protein attributes.

MSC:

92D20 Protein sequences, DNA sequences
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Arif, M.; Hayat, M.; Jan, Z., iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou’s pseudo amino acid composition, J. Theor. Biol., 442, 11-21 (2018) · Zbl 1397.92180
[2] Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C. A.; Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 412-424 (2000)
[3] Bo, L.; Qilin, X.; Dachao, L., Incorporating secondary features into the general form of Chou’s PseAAC for predicting protein structural class, Protein. Peptide. Lett., 19, 1133-1138 (2012)
[4] Cai, Y.-D.; Chou, K.-C., Predicting membrane protein type by functional domain composition and pseudo-amino acid composition, J. Theor. Biol., 238, 395-400 (2006) · Zbl 1445.92219
[5] Cai, Y.-D.; Hu, J.; Liu, X.; Chou, K.-C., Prediction of protein structural classes by neural network method, J. Mol. Des., 1, 332-338 (2002)
[6] Cai, Y.-D.; Liu, X.-J.; Xu, X.-b.; Chou, K.-C., Prediction of protein structural classes by support vector machines, Comput. Chem. (Oxford, U. K.), 26, 293-296 (2002)
[7] Cai, Y.-D.; Feng, K.-Y.; Lu, W.-C.; Chou, K.-C., Using LogitBoost classifier to predict protein structural classes, J. Theor. Biol., 238, 172-176 (2006) · Zbl 1445.92220
[8] Caltabiano, G.; Gonzalez, A.; Cordomí, A.; Campillo, M.; Pardo, L., Chapter five - the role of hydrophobic amino acids in the structure and function of the rhodopsin family of g protein-coupled receptors, (Conn, P. M., Methods in Enzymology, Vol. 520 (2013), Academic Press), 99-115
[9] Collantes, E. R.; Dunn, W. J., Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogs, J. Med. Chem., 38, 2705-2713 (1995)
[10] Chao, C.; Zhi-Bin, S.; Xiao-Yong, Z., Dual-layer wavelet SVM for predicting protein structural class via the general form of Chou’s pseudo amino acid composition, Protein. Peptide. Lett., 19, 422-429 (2012)
[11] Chen, C.; Chen, L.-X.; Zou, X.-Y.; Cai, P.-X., Predicting protein structural class based on multi-features fusion, J. Theor. Biol., 253, 388-392 (2008) · Zbl 1398.92196
[12] Chen, C.; Tian, Y.-X.; Zou, X.-Y.; Cai, P.-X.; Mo, J.-Y., Using pseudo-amino acid composition and support vector machine to predict protein structural class, J. Theor. Biol., 243, 444-448 (2006) · Zbl 1447.92300
[13] Chen, K.; Kurgan, L. A.; Ruan, J., Prediction of protein structural class using novel evolutionary collocation-based sequence representation, J. Comput. Chem., 29, 1596-1604 (2008)
[14] Chen, W.; Lin, H.; Chou, K.-C., Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences, Mol. BioSyst., 11, 2620-2634 (2015)
[15] Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C., iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res. (2013), gks1450
[16] Chen, W.; Lei, T.-Y.; Jin, D.-C.; Lin, H.; Chou, K.-C., PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition, Anal. Biochem., 456, 53-60 (2014)
[17] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C., iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208-4217 (2017)
[18] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC, Mol. BioSyst., 13, 1722-1727 (2017)
[19] Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C., iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346 (2017)
[20] Chou, J. J.; Zhang, C.-T., A Joint prediction of the folding types of 1490 human proteins from their genetic codons, J. Theor. Biol., 161, 251-262 (1993)
[21] Chou, K.-C., Energy-optimized structure of antifreeze protein and its binding mechanism, J. Mol. Biol., 223, 509-517 (1992), doi:http://dx.doi.org/10.1016/0022-2836(92)90666-8
[22] Chou, K.-C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins Struct. Funct. Bioinf., 21, 319-344 (1995)
[23] Chou, K.-C., A key driving force in determination of protein structural classes, Biochem. Biophys. Res. Commun., 264, 216-224 (1999)
[24] Chou, K.-C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins Struct. Funct. Bioinf., 43, 246-255 (2001)
[25] Chou, K.-C., Progress in protein structural class prediction and its impact to bioinformatics and proteomics, Curr. Protein Pept. Sci., 6, 423-436 (2005)
[26] Chou, K.-C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19 (2005)
[27] Chou, K.-C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics., 6, 262-274 (2009)
[28] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[29] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem, 11, 218-234 (2015)
[30] Chou, K.-C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358 (2017)
[31] Chou, K.-C.; Zhang, C.-T., A correlation-coefficient method to predicting protein-structural classes from amino acid compositions, Eur. J. Biochem., 207, 429-433 (1992)
[32] Chou, K.-C.; Zhang, C.-T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, J. Biol. Chem., 269, 22014-22020 (1994)
[33] Chou, K.-C.; Cai, Y.-D., Predicting protein structural class by functional domain composition, Biochem. Biophys. Res. Commun., 321, 1007-1009 (2004)
[34] Chou, K.-C.; Shen, H.-B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[35] Chou, K. C., Prediction of protein structural classes and subcellular locations, Curr. Protein Pept. Sci., 1, 171-208 (2000)
[36] Chun-Ting, Z.; Kuo-Chen, C., An optimization approach to predicting protein structural class from amino acid composition, Protein. Sci., 1, 401-408 (1992)
[37] Ding, Y.-S.; Zhang, T.-L.; Chou, K.-C., Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein Pept. Lett., 14, 811-815 (2007)
[38] Du, Q.-S.; Jiang, Z.-Q.; He, W.-Z.; Li, D.-P.; Chou, K.-C., Amino acid principal component analysis (AAPCA) and its applications in protein structural class prediction, J. Biomol. Struct. Dyn., 23, 635-640 (2006)
[39] Du, Q.; Wei, D.; Chou, K.-C., Correlations of amino acids in proteins, Peptides, 24, 1863-1869 (2003)
[40] Gramatica, P., Principles of QSAR models validation: internal and external, QSAR Comb. Sci., 26, 694-701 (2007)
[41] Gromiha, M. M., A statistical model for predicting protein folding rates from amino acid sequence with structural class information, J. Chem. Inf. Model, 45, 494-501 (2005)
[42] Gromiha, M. M.; Selvaraj, S.; Thangakani, A. M., A statistical method for predicting protein unfolding rates from amino acid sequence, J. Chem. Inf. Model, 46, 1503-1508 (2006)
[43] Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition, Bioinformatics (2014), btu083
[44] Harley, C. A.; Tipper, D. J., The role of charged residues in determining transmembrane protein insertion orientation in yeast, J. Biol. Chem., 271, 24625-24633 (1996)
[45] Hellberg, S.; Sjostrom, M.; Skagerberg, B.; Wold, S., Peptide quantitative structure-activity relationship, a multivariate approach, J. Med. Chem., 30 (1987)
[46] Hellberg, S.; Sjoestroem, M.; Skagerberg, B.; Wold, S., Peptide quantitative structure-activity relationships, a multivariate approach, J. Med. Chem., 30, 1126-1135 (1987)
[47] Hopp, T. P.; Woods, K. R., Prediction of protein antigenic determinants from amino acid sequences, Proc. Natl. Acad. Sci. U. S. A., 78, 3824-3828 (1981)
[48] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC, J. Theor. Biol, 377, 47-56 (2015)
[49] Kardaun, O. J., Classical Methods of Statistics: With Applications in Fusion-Oriented Plasma Physics (2005), Springer Science & Business Media · Zbl 1088.62141
[50] Kidera, A.; Konishi, Y.; Ooi, T.; Scheraga, H. A., Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids, J. Protein Chem., 4, 265-297 (1985)
[51] Kidera, A.; Konishi, Y.; Oka, M.; Ooi, T.; Scheraga, H. A., Statistical analysis of the physical properties of the 20 naturally occurring amino acids, J. Protein Chem., 4, 23-55 (1985)
[52] Kong, L.; Zhang, L.; Lv, J., Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition, J. Theor. Biol., 344, 12-18 (2014) · Zbl 1412.92246
[53] Kyte, J.; Doolittle, R. F., A simple method for displaying the hydropathic character of a protein, J. Mol. Biol., 157, 105-132 (1982)
[54] Lehninger, A.; Nelson, D. L.; Cox, M. M., Lehninger’s Principles of Biochemistry (2005), WH Freeman and Company: WH Freeman and Company New York
[55] Levitt, M., Conformational preferences of amino acids in globular proteins, Biochemistry, 17, 4277-4285 (1978)
[56] Li, Z.-C.; Zhou, X.-B.; Dai, Z.; Zou, X.-Y., Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis, Amino Acids, 37, 415-425 (2009)
[57] Liang, Y.; Zhang, S., Predict protein structural class by incorporating two different modes of evolutionary information into Chou’s general pseudo amino acid composition, J. Mol. Graph. Model., 78, 110-117 (2017)
[58] Lin, H.; Li, Q.-Z., Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components, J. Comput. Chem., 28, 1463-1466 (2007)
[59] Liu, B.; Wu, H.; Chou, K.-C., Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 09, 04, 23 (2017)
[60] Liu, B.; Wang, X.; Chen, Q.; Dong, Q.; Lan, X., Using amino acid physicochemical distance transformation for fast protein remote homology detection, PLoS ONE, 7, e46633 (2012)
[61] Liu, B.; Wang, X.; Zou, Q.; Dong, Q.; Chen, Q., Protein remote homology detection by combining Chou’s pseudo amino acid composition and profile-based protein representation, Mol. Inf., 32, 775-782 (2013)
[62] Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.-C., repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects, Bioinformatics (2014), btu820
[63] Liu, B.; Xu, J.; Zou, Q.; Xu, R.; Wang, X.; Chen, Q., Using distances between Top-n-gram and residue pairs for protein remote homology detection, BMC Bioinformatics, 15, S3 (2014)
[64] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C., Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71 (2015)
[65] Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K.-C., iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition, PLoS ONE, 9, Article e106691 pp. (2014)
[67] Liu, W.-m.; Chou, K.-C., Prediction of protein structural classes by modified mahalanobis discriminant algorithm, J. Protein Chem., 17, 209-217 (1998)
[68] Marrero-Ponce, Y.; Contreras-Torres, E.; García-Jacas, C. R.; Barigye, S. J.; Cubillán, N.; Alvarado, Y. J., Novel 3D bio-macromolecular bilinear descriptors for protein science: predicting protein structural classes, J. Theor. Biol., 374, 125-137 (2015) · Zbl 1341.92053
[69] Mathews, C. K.; van Holde, K. E.; Ahern, K. G., Biochemistry (2000), BenjaminCummings: BenjaminCummings San Francisco
[70] Mei, J.; Zhao, J., Prediction of HIV-1 and HIV-2 proteins by using Chou’s pseudo amino acid compositions and different classifiers, Sci. Rep., 8, 2359 (2018)
[71] Muthu Krishnan, S., Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains, J. Theor. Biol., 445, 62-74 (2018) · Zbl 06898959
[72] Ruiz-Blanco, Y. B.; Paz, W.; Green, J.; Marrero-Ponce, Y., ProtDCal: a program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins, BMC Bioinf., 16, 162 (2015)
[73] Sahu, S. S.; Panda, G., A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction, Comput. Biol. Chem., 34, 320-327 (2010) · Zbl 1403.92221
[74] Sak, K.; Karelson, M.; Järv, J., Modeling of the amino acid side chain effects on peptide conformation, Bioorg. Chem., 27, 434-442 (1999)
[75] Shen, H.-B.; Yang, J.; Liu, X.-J.; Chou, K.-C., Using supervised fuzzy clustering to predict protein structural classes, Biochem. Biophys. Res. Commun., 334, 577-581 (2005)
[76] Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N. D.; Webb, G. I.; Chou, K.-C., iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief. Bioinf. (2018), bby028-bby028
[77] Todeschini, R.; Consonni, V., Molecular descriptors for chemoinformatics, (Mannhold, R.; etal., Methods and Principles in Medicinal Chemistry (2009), Wiley-VCH: Wiley-VCH Weinheim)
[78] Tropsha, A.; Gramatica, P.; Gombar, V. K., The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models, QSAR Comb. Sci., 22, 69-77 (2003)
[79] Witten, I. H.; Frank, E.; Hall, M. A.; Pal, C. J., Data Mining: Practical Machine Learning Tools and Techniques (2016), Morgan Kaufmann
[80] Xiao, X.; Wang, P.; Chou, K.-C., Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image, J. Theor. Biol., 254, 691-696 (2008) · Zbl 1400.92416
[81] Xiao, X.; Lin, W.-Z.; Chou, K.-C., Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes, J. Comput. Chem., 29, 2018-2024 (2008)
[82] Xiao, X.; Shao, S.-H.; Huang, Z.-D.; Chou, K.-C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. Comput. Chem., 27, 478-482 (2006)
[83] Xiao, X.; Shao, S.-H.; Huang, Z. D.; Chou, K.-C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. Comput. Chem., 27, 478-482 (2006)
[84] Xu, Y.; Wen, X.; Wen, L.-S.; Wu, L.-Y.; Deng, N.-Y.; Chou, K.-C., iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition, PLoS ONE, 9, Article e105018 pp. (2014)
[85] Yong-Sheng, D.; Tong-Liang, Z.; Kuo-Chen, C., Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network, Protein. Peptide. Lett., 14, 811-815 (2007)
[86] Yu-Fang, Q.; Chun-Hua, W.; Xiao-Qing, Y.; Jie, Z.; Tai-Gang, L.; Xiao-Qi, Z., Predicting protein structural class by incorporating patterns of over- represented k-mers into the general form of Chou’s PseAAC, Protein. Peptide. Lett., 19, 388-397 (2012)
[87] Yu, B.; Lou, L.; Li, S.; Zhang, Y.; Qiu, W.; Wu, X.; Wang, M.; Tian, B., Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising, J. Mol. Graph. Model., 76, 260-273 (2017)
[88] Zamyatnin, A., Protein volume in solution, Prog. Biophys. Mol. Biol., 24, 107-123 (1972)
[89] Zhang, L.; Zhao, X.; Kong, L., Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou׳s pseudo amino acid composition, J. Theor. Biol., 355, 105-110 (2014)
[90] Zhang, S., Accurate prediction of protein structural classes by incorporating PSSS and PSSM into Chou’s general PseAAC, Chemometr. Intell. Lab., 142, 28-35 (2015)
[91] Zhang, T.-L.; Ding, Y.-S., Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes, Amino Acids, 33, 623-629 (2007)
[92] Zhang, T.-L.; Ding, Y.-S.; Chou, K.-C., Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern, J. Theor. Biol., 250, 186-193 (2008) · Zbl 1397.92551
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.