×

Expected distance between terminal nucleotides of RNA secondary structures. (English) Zbl 1304.05073

Summary: A. M. Yoffe et al. [“The ends of a large RNA molecule are necessarily close”, Nucleic Acids Res. 39, No. 1, 292–299 (2011)] used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between \(5^{\prime}\) and \(3^{\prime}\) ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the \(5^{\prime}\)–\(3^{\prime}\) distance is defined to be the length of the shortest path from \(5^{\prime}\) node to \(3^{\prime}\) node in the undirected graph, whose edge set consists of edges \(\{i, i + 1\}\) corresponding to covalent backbone bonds and of edges \(\{i, j\}\) corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. [loc. cit.] conclude that the \(5^{\prime}\)–\(3^{\prime}\) distance is less than a fixed constant, independent of RNA sequence length.
In this paper, we provide a rigorous, mathematical framework to study the expected distance from \(5^{\prime}\) to \(3^{\prime}\) ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from \(5^{\prime}\) to \(3^{\prime}\) ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between \(5^{\prime}\) and \(3^{\prime}\) ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected \(5^{\prime}\)–\(3^{\prime}\) distance \({\langle d_n \rangle}\) of length \(n\) homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the \(5^{\prime}\)–\(3^{\prime}\) distance for secondary structures from the STRAND database, and conclude that the \(5^{\prime}\)–\(3^{\prime}\) distance is correlated with RNA sequence length.

MSC:

05C30 Enumeration in graph theory
49L20 Dynamic programming in optimal control and differential games
90C39 Dynamic programming
92C40 Biochemistry, molecular biology
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Andronescu M, Bereg V, Hoos HH, Condon A (2008) RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinform 9: 340 · doi:10.1186/1471-2105-9-340
[2] Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, Thanki N, Weissig H, Westbrook JD, Zardecki C (2002) The protein data bank. Acta Crystallogr D Biol Crystallogr 58(Pt): 899–907 · doi:10.1107/S0907444902003451
[3] Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C (2003) The nucleic acid database. Methods Biochem Anal 44: 199–216
[4] Cormen T, Leiserson C, Rivest R (1990) Algorithms. McGraw-Hill, New York
[5] Corver J, Lenches E, Smith K, Robison RA, Sando T, Strauss EG, Strauss JH (2003) Fine mapping of a cis-acting sequence element in yellow fever virus RNA that is required for RNA replication and cyclization. J Virol 77(3): 2265–2270 · doi:10.1128/JVI.77.3.2265-2270.2003
[6] Darty K, Denise A, Ponty Y (2009) VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25(15): 1974–1975 · Zbl 05744129 · doi:10.1093/bioinformatics/btp250
[7] Flajolet P, Sedgewick R (2009) Analytic Combinatorics. Cambridge University, Cambridge ISBN-13:9780521898065 · Zbl 1165.05001
[8] Gallie DR (1991) The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes Dev 5(11): 2108–2116 · doi:10.1101/gad.5.11.2108
[9] Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37(Database): D136–D140 · Zbl 05746514 · doi:10.1093/nar/gkn766
[10] Gerland U, Bundschuh R, Hwa T (2001) Force-induced denaturation of RNA. Biophys J 81: 1324–1332 · doi:10.1016/S0006-3495(01)75789-X
[11] Gutell R, Lee J, Cannone J (2005) The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol 12: 301–310 · doi:10.1016/S0959-440X(02)00339-1
[12] Hofacker I (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31(13): 3429–3431 · Zbl 05435843 · doi:10.1093/nar/gkg599
[13] Hofacker IL, Schuster P, Stadler PF (1998) Combinatorics of RNA secondary structures. Discret Appl Math 88:207–237. http://citeseer.nj.nec.com/1454.html · Zbl 0918.05004
[14] Hopcroft JE, Ullman JD (1969) Formal languages and their relation to automata. Addison-Wesley, Reading · Zbl 0196.01701
[15] Hsu MT, Parvin JD, Gupta S, Krystal M, Palese P (1987) Genomic RNAs of influenza viruses are held in a circular conformation in virions and in infected cells by a terminal panhandle. Proc Natl Acad Sci USA 84(22): 8140–8144 · doi:10.1073/pnas.84.22.8140
[16] Kneller EL, Rakotondrafara AM, Miller WA (2006) Cap-independent translation of plant viral RNAs. Virus Res 119(1): 63–75 · doi:10.1016/j.virusres.2005.10.010
[17] Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1): 31–63 · doi:10.1089/cmb.2006.0153
[18] McCaskill J (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29: 1105–1119 · doi:10.1002/bip.360290621
[19] Miller WA, White KA (2006) Long-distance RNA–RNA interactions in plant virus gene expression and replication. Annu Rev Phytopathol 44: 447–467 · doi:10.1146/annurev.phyto.44.070505.143353
[20] Nussinov R, Jacobson AB (1980) Fast algorithm for predicting the secondary structure of single stranded RNA. Proc Natl Acad Sci USA 77(11): 6309–6313 · doi:10.1073/pnas.77.11.6309
[21] Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26: 148–153 · Zbl 05437027 · doi:10.1093/nar/26.1.148
[22] Stein PR, Waterman MS (1978) On some new sequences generalizing the Catalan and Motzkin numbers. Discret Math 26: 261–272 · Zbl 0405.10009 · doi:10.1016/0012-365X(79)90033-5
[23] Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D (1999) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry 37(14): 719–735
[24] Yoffe AM, Prinsen P, Gelbart WM, Ben-Shaul A (2011) The ends of a large RNA molecule are necessarily close. Nucleic Acids Res 39(1): 292–299 · doi:10.1093/nar/gkq642
[25] Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13): 3406–3415 · Zbl 05437421 · doi:10.1093/nar/gkg595
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.