×

Identify Gram-negative bacterial secreted protein types by incorporating different modes of PSSM into Chou’s general PseAAC via Kullback-Leibler divergence. (English) Zbl 1406.92196

Summary: Gram-negative bacterial secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments. Therefore, identification of bacterial secreted proteins becomes a significant process for the research of various diseases and the corresponding drugs. In this paper, we develop a feature design model named ACCP-KL-NMF by fusing PSSM-based auto-cross correlation analysis for features extraction and nonnegative matrix factorization algorithm based on Kullback-Leibler divergence for dimensionality reduction. Hence, a 150-dimensional feature vector is constructed on the training set. Then support vector machine is adopted as the classifier, and the most objective jackknife test is chosen for evaluating the accuracy. The ACCP-KL-NMF model yields the approving performance of the overall accuracy on the test set, and also outperforms the other three existing models. The numerical experimental results show that our model is effective and reliable for identification of Gram-negative bacterial secreted protein types. Moreover, it is anticipated that the proposed model could be beneficial for other biology sequence in future research.

MSC:

92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ahmad, K.; Waris, M.; Hayat, M., Prediction of protein submitochondrial locations by incorporating dipeptide composition into Chou’s general pseudo amino acid composition, J. Membr. Biol., 249, 293-304 (2016)
[2] Altschul, S. F.; Madden, T. L.; Schäffer, A. A., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402 (1997)
[3] Bendtsen, J. D.; Kiemer, L.; Fausboll, A., Non-classical protein secretion in bacteria, BMC Microbiol., 5, 58-70 (2005)
[4] Blocker, A.; Komoriya, K.; Aizawa, S., Type III secretion systems and bacterial flagella: insights into their function from structural similarities, Proc. Natl. Acad. Sci. USA, 100, 3027-3030 (2003)
[5] Boeckmann, B.; Bairoch, A.; Apweiler, R., The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res., 31, 365-370 (2003)
[6] Bu, W. S.; Feng, Z. P.; Zhang, Z., Prediction of protein (domain) structural classes based on amino-acid index, Eur. J. Biochem., 266, 1043-1049 (1999)
[7] Buttner, D.; Bonas, U., Common infection strategies of plant and animal pathogenic bacteria, Curr. Opin. Plant Biol., 6, 312-319 (2003)
[8] Chang, C. C., Lin, C. J., 2001. LIBSVM: a library for support vector machines.; Chang, C. C., Lin, C. J., 2001. LIBSVM: a library for support vector machines.
[9] Chen, C.; Chen, L. X.; Zou, X. Y., Predicting protein structural class based on multi-features fusion, J. Theor. Biol., 253, 388-392 (2008) · Zbl 1398.92196
[10] Chen, J.; Xu, H. M.; He, P. A., A multiple information fusion method for predicting subcellular locations of two different types of bacterial protein simultaneously, BioSystems, 139, 37-45 (2016)
[11] Chen, W.; Feng, P. M.; Lin, H., iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68 (2013)
[12] Chen, W.; Feng, P. M.; Yang, H., iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208-4217 (2017)
[13] Chen, W.; Lei, T. Y.; Jin, D. C., PseKNC: a flexible web-server for generating pseudo k-tuple nucleotide composition, Anal. Biochem., 456, 53-60 (2014)
[14] Chen, W.; Lin, H.; Chou, K. C., Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences, Mol. Biosyst., 11, 2620-2634 (2015)
[15] Cheng, X.; Xiao, X.; Chou, K. C., pLoc-mGneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general PseAAC, Genomics (2017)
[16] Cheng, X.; Xiao, X.; Chou, K. C., pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general pseAAC, Mol. Biosyst., 13, 1722-1727 (2017)
[17] Cheng, X.; Xiao, X.; Chou, K. C., pLoc-mVirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseAAC, Gene, 628, 315-321 (2017)
[18] Cheng, X.; Xiao, X.; Chou, K. C., pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC, Genomics, 110, 50-58 (2018)
[20] Cheng, X.; Zhao, S. G.; Lin, W. Z., pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531 (2017)
[21] Cheng, X.; Zhao, S. G.; Xiao, X., iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 58494-58503 (2017)
[22] Cheng, X.; Zhao, S. G.; Xiao, X., iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346 (2017)
[23] de Chial, M.; Ghysels, B.; Beatson, S. A., Identification of type II and type III pyoverdinereceptors from pseudomonas aeruginosa, Microbiology, 149, 821-831 (2003)
[24] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, proteins: struct, Funct. Bioinform., 43, 246-255 (2001)
[25] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19 (2005)
[26] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. Theor. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[27] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100 (2013)
[28] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234 (2015)
[29] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358 (2017)
[30] Chou, K. C.; Shen, H. B., Review: recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[31] Dehzangi, A.; Heffernan, R.; Sharma, A., Gram-positive and gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou’s general pseAAC, J. Theor. Biol., 364, 284-294 (2015) · Zbl 1405.92092
[32] Desvaux, M.; Hebraud, M.; Talon, R., Secretion and subcellular localizations of bacterial proteins: a semantic awareness issue, Trends Microbiol., 17, 139-145 (2009)
[33] Ding, S. Y.; Zhang, S. L., A gram-negative bacterial secreted protein types prediction method based on PSI-BLAST profile, BioMed. Res. Int., 3206741, 1-5 (2016)
[34] Dou, Y. C.; Yao, B.; Zhang, C., PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine, Amino Acids, 46, 1459-1469 (2014)
[35] Fan, G. L.; Li, Q. Z., Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition, J. Theor. Biol., 304, 88-95 (2012) · Zbl 1397.92186
[36] Feng, P.; Ding, H.; Yang, H., iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC, Mol. Ther. Nucleic Acids, 7, 155-163 (2017)
[37] Huang, C.; Yuan, J. Q., Using radial basis function on the general form of Chou’s pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites, BioSystems, 113, 50-57 (2013)
[38] Jia, J.; Liu, Z.; Xiao, X., iCar-PseCp: identify carbonylation sites in proteins by monto carlo sampling and incorporating sequence coupled effects into general PseAAC, Oncotarget, 7, 34558-34570 (2016)
[39] Jia, J. H.; Liu, Z.; Xiao, X., iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC, J Theor. Biol., 377, 47-56 (2015)
[40] Kawashima, S.; Kanehisa, M., Aaindex: amino acid index database, Nucleic Acids Res., 28 (2000)
[41] Konkel, M. E.; Kim, B. J.; Rivera-Amill, V., Bacterial secreted proteins are required for the internalization of campylobacter jejuni into cultured mammalian cells, Mol. Microbiol., 32, 691-701 (1999)
[42] Lee, D. D.; Seung, H. S., Learning the parts of objects by nonnegative matrix factorization, Nature, 401, 788-791 (1999) · Zbl 1369.68285
[43] Lee, D. D.; Seung, H. S., Algorithms for non-negative matrix factorization, Advance in Neural Information Processing Systems, 556-562 (2001), MIT Press
[44] Lee, V. T.; Schneewind, O., Review: protein secretion and the pathogenesis of bacterial infections, Genes Dev., 15, 1725-1752 (2001)
[45] Li, S. L.; Li, H.; Li, M. F., Improved prediction of lysine acetylation by support vector machines, Protein Peptide Lett, 16, 977-983 (2009)
[46] Liu, B.; Fang, L.; Liu, F., Identification of real microRNA precursors with a pseudo structure status composition approach, PLoS ONE, 10, e0121501 (2015)
[47] Liu, B.; Fang, L.; Long, R., IEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformatics, 32, 362-369 (2016)
[48] Liu, B.; Fang, L. Y.; Liu, F., Identification of real microRNA precursors with a pseudo structure status composition approach, PLoS ONE, 10, e0121501 (2015)
[49] Liu, B.; Liu, F.; Wang, X., Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71 (2015)
[50] Liu, B.; Long, R.; Chou, K. C., iDHS-EL: identifying DNase I hypersensi-tivesites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32, 2411-2418 (2016)
[51] Liu, B.; Wang, S. Y.; Long, R., iRSpot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41 (2017)
[52] Liu, B.; Wu, H.; Chou, K. C., Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 9, 67-91 (2017)
[53] Liu, B.; Yang, F.; Chou, K. C., 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277 (2017)
[54] Liu, B.; Yang, F.; Huang, D. S., iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC, Bioinformatics, 34, 33-40 (2018)
[55] Liu, L. M.; Xu, Y.; Chou, K. C., iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general pseAAC, Med. Chem., 13, 552-559 (2017)
[56] Liu, Z.; Xiao, X.; Yu, D. J., pRNAm-PC: predicting n6-methyladenosine sites in RNA sequences via physical-chemical properties, Anal. Biochem., 497, 60-67 (2016)
[57] Meher, P. K.; Sahu, T. K.; Saini, V., Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC, Sci. Rep., 7, 42362 (2017)
[58] Mei, J.; Zhao, J., Prediction of HIV-1 and HIV-2 proteins by using Chou’s pseudo amino acid compositions and different classifiers, Sci. Rep., 8, 2359 (2018)
[59] Mudrak, B.; Kuehn, M. J., Specificity of the type II secretion systems of enterotoxigenic Escherichia coli and Vibrio cholerae for heat-labile enterotoxin and choleratoxin, J. Bacteriol., 192, 1902-1911 (2010)
[60] Niu, S.; Hu, L. L.; Zheng, L. L., Predicting protein oxidation sites with feature selection and analysis approach, J. Biomol. Struct. Dyn., 29, 650-658 (2012)
[61] Omori, K.; Idei, A., Gram-negative bacterial atp-binding cassette protein exporter family and diverse secretory proteins, J. Biosci. Bioeng., 95, 1-12 (2003)
[62] Pruitt, K. D.; Tatusova, T.; Klimke, W., NCBI reference sequences: current status, policy and new initiatives, Nucleic Acids Res., 37, 32-36 (2009)
[63] Pundhir, S.; Kumar, A., SSPred: a prediction server based on SVM for the identification and classification of proteins involved in bacterial secretion systems, Bioinformation, 6, 380-382 (2011)
[64] Qiu, W. R.; Jiang, S. Y.; Xu, Z. C., iRNAm5c-pseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 41178-41188 (2017)
[65] Qiu, W. R.; Sun, B. Q.; Xiao, X., iPTM-mLys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 3116-3123 (2016)
[66] Qiu, W. R.; Sun, B. Q.; Xiao, X., iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics (2017)
[67] Qiu, W. R.; Sun, B. Q.; Xiao, X., iPhos-PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general pseAAC via grey system theory, Mol. Inform., 36 (2017)
[68] Shen, H. B., Large-scale predictions of gram-negative bacterial protein subcellular locations, J. Proteome Res., 5, 3420-3428 (2006)
[69] Shen, H. B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63-92 (2009)
[70] Shen, H. B., Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of gram-negative bacterial proteins, J. Theor. Biol., 264, 326-333 (2010) · Zbl 1406.92211
[71] Song, J.; Li, F.; Takemoto, K., An integrative approach for inferring catalytic residues using sequence, structural and network features in a machine learning framework, J. Theor. Biol., 443, 125-137 (2018) · Zbl 06898995
[72] Song, J.; Wang, Y.; Li, F., iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief. Bioinform (2018)
[73] Su, Q.; Lu, W.; Du, D., Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression, Oncotarget, 8, 49359-49369 (2017)
[74] UniProt Consortium, The universal protein resource (UniProt), Nucleic Acids Res., 36, 190-195 (2008)
[75] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley NewYork · Zbl 0935.62007
[76] Wang, J.; Yang, B.; Revote, J., POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles, Bioinformatics, 33, 2756-2758 (2017)
[77] Wang, J. R.; Wang, C.; Cao, J. J., Prediction of protein structural classes for low-similarity sequences using reduced PSSM and position-based secondary structural features, Gene, 554, 241-248 (2015)
[78] Xiao, X.; Cheng, X.; Su, S., pLoc-mGpos: incorporate key gene ontology information into general PseAAC for predicting subcellular localization of gram-positive bacterial proteins, Nat. Sci., 9, 331-349 (2017)
[79] Xiao, X.; Wu, Z. C.; Chou, K. C., A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites, PLoS ONE, 6, e20592 (2011)
[80] Xu, Y.; Shao, X. J.; Wu, L. Y., iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine s-nitrosylation sites in proteins, PeerJ., 1, e171 (2013)
[81] Xu, Y.; Wang, Z.; Li, C., iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC, Med. Chem., 13, 544-551 (2017)
[82] Yang, J. Y.; Chen, X., Improving taxonomy-based protein fold recognition by using global and local features, Proteins, 79, 2053-2064 (2011)
[83] Yao, Y. H.; Shi, Z. X.; Dai, Q., Apoptosis protein subcellular location prediction based on position-specific scoring matrix, J. Comput. Theor. Nanosci., 11, 2073-2078 (2014)
[84] Yu, L. Z.; Guo, Y. Z.; Zhang, Z., SecretP: a new method for predicting mammalian secreted proteins, Peptides, 31, 574-578 (2010)
[85] Yu, L. Z.; Luo, J. S.; Guo, Y. Z., In silico identification of gram-negative bacterial secreted proteins from primary sequence, Comput. Biol. Med., 43, 1177-1181 (2013)
[86] Zhang, S. L.; Duan, X., Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAAC, J. Theor. Biol., 437, 239-250 (2018) · Zbl 1394.92047
[87] Zhang, Y. N.; Yu, D. J.; Li, S. S., Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features, BMC Bioinf., 13, 1-11 (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.