×

Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach. (English) Zbl 1369.92035

Summary: Protein interactions with ribonucleic acids (RNA) are well-known to be crucial for a wide range of cellular processes such as transcriptional regulation, protein synthesis or translation, and post-translational modifications. Identification of the RNA-interacting residues can provide insights into these processes and aid in relevant biotechnological manipulations. Owing to their eventual potential in combating diseases and industrial production, several computational attempts have been made over years using sequence- and structure-based information. Recent comparative studies suggest that despite these developments, many problems are faced with respect to the usability, prerequisites, and accessibility of various tools, thereby calling for an alternative approach and perspective supplementation in the prediction scenario. With this motivation, in this paper, we propose the use of a simple-yet-efficient conditional probabilistic approach based on the application of local occurrence of amino acids in the interacting region in a non-numeric sequence feature space, for discriminating between RNA interacting and non-interacting residues. The proposed method has been meticulously tested for robustness using a cross-estimation method showing MCC of 0.341 and \(F-\) measure of 66.84%. Upon exploring large scale applications using benchmark datasets available to date, this approach showed an encouraging performance comparable with the state-of-art. The software is available at https://github.com/ABCgrp/DORAEMON.

MSC:

92C40 Biochemistry, molecular biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Auweter, S. D.; Oberstrass, F. C.; Allain, F. H.-T., Sequence-specific binding of single-stranded rna: is there a code for recognition?, Nucleic Acids Res., 34, 17, 4943-4959 (2006)
[2] Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic Acids Res., 25, 17, 3389-3402 (1997)
[3] Althaus, I. W.; Gonzales, A.; Chou, J.; Romero, D.; Deibel, M.; Chou, K.; Kezdy, F.; Resnick, L.; Busso, M.; So, A., The quinoline u-78036 is a potent inhibitor of hiv-1 reverse transcriptase, J. Biol. Chem., 268, 20, 14875-14880 (1993)
[4] Althaus, I. W.; Chou, J. J.; Gonzales, A. J.; Deibel, M. R.; Chou, K. C.; Kezdy, F. J.; Romero, D. L.; Palmer, J. R.; Thomas, R. C., Kinetic studies with the non-nucleoside hiv-1 reverse transcriptase inhibitor u-88204e, Biochemistry, 32, 26, 6548-6554 (1993)
[6] Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C., Using deformation energy to analyze nucleosome positioning in genomes, Genomics, 107, 2, 69-75 (2016)
[8] Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C., irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res. (2013), (gks1450)
[9] Chen, W.; Ding, H.; Feng, P.; Lin, H.; Chou, K.-C., iacp: a sequence-based tool for identifying anticancer peptides., Oncotarget, 7, 13, 16895-16909 (2016)
[10] Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C., iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics (2016), (btw644)
[11] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 6, 1092-1100 (2013)
[12] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247 (2011) · Zbl 1405.92212
[13] Chou, K.; Jiang, S.; Liu, W.; Fee, C., Graph-theory of enzyme-kinetics. 1. steady-state reaction systems, Sci. Sin., 22, 3, 341-358 (1979) · Zbl 0399.92007
[14] Chou, K.; Forsén, S., Graphical rules for enzyme-catalysed rate laws, Biochem. J., 187, 3, 829-835 (1980)
[15] Chou, K.-C., Graphic rules in steady and non-steady state enzyme kinetics, J. Biol. Chem., 264, 20, 12074-12079 (1989)
[16] Chou, K.-C., Applications of graph theory to enzyme kinetics and protein folding kinetics: steady and non-steady-state systems, Biophys. Chem., 35, 1, 1-24 (1990)
[17] Chou, K.-C., Graphic rule for drug metabolism systems, Curr. Drug Metab., 11, 4, 369-378 (2010)
[18] Chou, K.-C.; Lin, W.-Z.; Xiao, X., Wenxiang: a web-server for drawing wenxiang diagrams, Nat. Sci., 3, 10, 862 (2011)
[19] Chou, K.-C.; Shen, H.-B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 02, 63 (2009)
[20] Chou, K.-C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 3, 218-234 (2015)
[22] Gagnon, M. G.; Seetharaman, S. V.; Bulkley, D.; Steitz, T. A., Structural basis for the rescue of stalled ribosomes: structure of yaej bound to the ribosome, Science, 335, 6074, 1370-1372 (2012)
[24] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56 (2016)
[25] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., psuc-lys: predict lysine succinylation sites in proteins with pseaac and ensemble random forest approach, J. Theor. Biol., 394, 223-230 (2016) · Zbl 1343.92153
[26] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., icar-psecp: identify carbonylation sites in proteins by monto carlo sampling and incorporating sequence coupled effects into general pseaac, Oncotarget, 7, 23, 34558-34570 (2016)
[27] Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K.-C., psumo-cd: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general pseaac, Bioinformatics, 32, 20, 3133-3141 (2016)
[28] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., ippbs-opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets, Molecules, 21, 1, 95 (2016)
[30] Jones, S.; Daley, D. T.; Luscombe, N. M.; Berman, H. M.; Thornton, J. M., Protein-rna interactions: a structural analysis, Nucleic Acids Res., 29, 4, 943-954 (2001)
[32] Kumar, M.; Gromiha, M. M.; Raghava, G., Prediction of rna binding sites in a protein using svm and pssm profile, Protein.: Struct., Funct., Bioinform., 71, 1, 189-194 (2008)
[33] Li, W.; Godzik, A., Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 13, 1658-1659 (2006)
[35] Liu, Z.; Xiao, X.; Yu, D.-J.; Jia, J.; Qiu, W.-R.; Chou, K.-C., prnam-pc: predicting n 6-methyladenosine sites in rna sequences via physical-chemical properties, Anal. Biochem., 497, 60-67 (2016)
[36] Lin, H.; Deng, E.-Z.; Ding, H.; Chen, W.; Chou, K.-C., ipro54-pseknc: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition, Nucleic Acids Res., 42, 21, 12961-12972 (2014)
[37] Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C., idna-methyl: identifying dna methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77 (2015)
[38] Mackereth, C. D.; Sattler, M., Dynamics in multi-domain protein recognition of rna, Curr. Opin. Struct. Biol., 22, 3, 287-296 (2012)
[39] Miao, Z.; Westhof, E., Rbscore&nbencha high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database, Nucleic Acids Res. (2016), (gkw251)
[40] Miao, Z.; Westhof, E., A large-scale assessment of nucleic acids binding site prediction programs, PloS Comput Biol., 11, 12, e1004639 (2015)
[42] Pai, P. P.; Mondal, S., Mowgli: prediction of protein-mannose interacting residues with ensemble classifiers using evolutionary information, J. Biomol. Struct. Dyn., 1-15 (2015)
[43] Pai, P. P.; Ranjani, S. S.; Mondal, S., Pingu: prediction of enzyme catalytic residues using sequence information, PloS One, 10, 8, e0135122 (2015)
[44] Pérez-Cano, L.; Fernández-Recio, J., Optimal protein-rna area, opra: a propensity-based method to identify rna-binding sites on proteins, Protein.: Struct., Funct., Bioinform., 78, 1, 25-35 (2010)
[45] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Chou, K.-C., iptm-mlys: identifying multiple lysine ptm sites and their different types, Bioinformatics, 32, 20, 3116-3123 (2016)
[46] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Chou, K.-C., ihyd-psecp: identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general pseaac, Oncotarget, 7, 28, 44310 (2016)
[47] Re, A.; Joshi, T.; Kulberkyte, E.; Morris, Q.; Workman, C. T., Rna-protein interactions: an overview, Rna Seq., Struct., Funct.: Comput. Bioinform. Methods, 491-521 (2014)
[48] Si, J.; Cui, J.; Cheng, J.; Wu, R., Computational prediction of rna-binding proteins and binding sites, Int. J. Mol. Sci., 16, 11, 26303-26317 (2015)
[50] Terribilini, M.; Sander, J. D.; Lee, J.-H.; Zaback, P.; Jernigan, R. L.; Honavar, V.; Dobbs, D., Rnabindr: a server for analyzing and predicting rna-binding sites in proteins, Nucleic Acids Res., 35, suppl 2, W578-W584 (2007)
[51] Wang, L.; Huang, C.; Yang, M. Q.; Yang, J. Y., Bindn+ for accurate prediction of dna and rna-binding residues from protein sequence features, BMC Syst. Biol., 4, 1, 1 (2010)
[52] Wu, Z.-C.; Xiao, X.; Chou, K.-C., 2d-mh: a web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids, J. Theor. Biol., 267, 1, 29-34 (2010) · Zbl 1410.92089
[53] Wu, Z.-C.; Xiao, X.; Chou, K.-C., loc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. Biosyst., 7, 12, 3287-3297 (2011)
[54] Xiao, X.; Min, J.-L.; Lin, W.-Z.; Liu, Z.; Cheng, X.; Chou, K.-C., idrug-target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach, J. Biomol. Struct. Dyn., 33, 10, 2221-2233 (2015)
[55] Xiong, D.; Zeng, J.; Gong, H., Rbrident: an algorithm for improved identification of rna-binding residues in proteins from primary sequences, Protein.: Struct., Funct., Bioinform., 83, 6, 1068-1077 (2015)
[56] Yan, J.; Friedrich, S.; Kurgan, A., A comprehensive comparative review of sequence-based predictors of dna-and rna-binding residues, Brief. Bioinforma., 17, 1, 88-105 (2016)
[57] Yasser, E.-M.; Abbas, M.; Malluhi, Q.; Honavar, V., Fastrnabindr: fast and accurate prediction of protein-rna interface residues, PloS One, 11, 7, e0158445 (2016)
[58] Yu, D.-J.; Hu, J.; Huang, Y.; Shen, H.-B.; Qi, Y.; Tang, Z.-M.; Yang, J.-Y., Targetatpsite: a template-free method for atp-binding sites prediction with residue evolution image sparse representation and classifier ensemble, J. Comput. Chem., 34, 11, 974-985 (2013)
[59] Zhao, H.; Yang, Y.; Zhou, Y., Prediction of rna binding proteins comes of age from low resolution to high resolution, Mol. Biosyst., 9, 10, 2417-2425 (2013)
[60] Zhang, C.-J.; Tang, H.; Li, W.-C.; Lin, H.; Chen, W.; Chou, K.-C., Iori-human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition, Oncotarget, 7, 43, 69783-69793 (2016)
[61] Zhou, G.; Deng, M., An extension of chou’s graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways, Biochem. J., 222, 1, 169-176 (1984)
[62] Zhou, G.-P., The disposition of the lzcc protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism, J. Theor. Biol., 284, 1, 142-148 (2011) · Zbl 1397.92245
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.