The following assignments should be solved using the tool
IntaRNA
. As some advanced IntaRNA
options are
required, you will need to install the tool locally. There are several
ways of installing IntaRNA
locally (conda, docker,
binaries, etc…).
Please follow the instructions suitable for your operating system on GitHub.
Additionally, you will need to download the query.fa and target.fa files. These files contain the sequences used for RNA-RNA interaction prediction.
Use IntaRNA
to predict an interaction between the query
and the target sequences given within the FASTA
file.
To see the available input parameters use the --help
(-h
) or --fullhelp
options.
Hybridization:
IntaRNA -q query.fa -t target.fa --acc=N
Accessibility:
IntaRNA -q query.fa -t target.fa
When only considering hybridization minimization, the resulting RNA-RNA interaction (RRI) prediction consists of several intermolecular stackings and the interaction region is very long. On the other hand, if we additionally consider the accessibility of the interaction region, the resulting prediction consists of only one (or a few) stable intermolecular base pair stackings.
When comparing the interaction energies, the energy for the hybridization-only mode is much lower than the accessibility-based one. The longer the resulting interaction, the lower the energy of \(E_{hybrid}\) and thus the more stable the model. The hybridization approach ignores the potentially necessary dissolving of intramolecular base pairs. This dissolving of intramolecular base pairs is considered in the accessibility-based approach by adding ED (positive) values. If we would take the long interaction result of the hybridization approach and evaluate it using the accessibility-based approach, we would end up with the same \(E_{hybrid}\). But the ED terms would be very high, thus leading to an overall higher energy \(E\) compared to the result from the accessibility based approach. The accessibility-based approach looks for the optimal trade-off between \(E_{hybrid}\) and ED values, leading to an overall better (and biologically more realistic) result. In other words, the energy \(E\) would be overall higher, but the resulting interaction would be more reasonable as it takes into account intramolecular base pairing.
We want to investigate the effect of using seed constraints in
RNA-RNA interaction prediction. Change the output of
IntaRNA
to CSV
format and increase the number
of sub-optimal predictions that should be returned. Additionally, make
sure that you use the exact prediction mode.
Using the two sequences from before, try out the following seed constraints:
To see the available input parameters use the --help
(-h
) or --fullhelp
options.
Testing several seed lengths:
IntaRNA -q query.fa -t target.fa --mode M --noSeed
IntaRNA -q query.fa -t target.fa --mode M --seedBP 2
IntaRNA -q query.fa -t target.fa --mode M --seedBP 4
IntaRNA -q query.fa -t target.fa --mode M --seedBP 7
IntaRNA -q query.fa -t target.fa --mode M --seedBP 9
Counting the resulting sub-optimal sequences:
IntaRNA -q query.fa -t target.fa --mode M --noSeed --outMode C -n 1000 | tail +2 | wc -l
IntaRNA -q query.fa -t target.fa --mode M --seedBP 2 --outMode C -n 1000 | tail +2 | wc -l
IntaRNA -q query.fa -t target.fa --mode M --seedBP 4 --outMode C -n 1000 | tail +2 | wc -l
IntaRNA -q query.fa -t target.fa --mode M --seedBP 7 --outMode C -n 1000 | tail +2 | wc -l
IntaRNA -q query.fa -t target.fa --mode M --seedBP 9 --outMode C -n 1000 | tail +2 | wc -l
Check the runtime:
time IntaRNA -q query.fa -t target.fa --mode M --noSeed --outMode C -n 1000 | wc -l
time IntaRNA -q query.fa -t target.fa --mode M --seedBP 2 --outMode C -n 1000 | wc -l
You can additionally add --outDeltaE 1
to restrict the
values of sub-optimal interaction energies accepted in the output.
Observation 1:
Counting the number of suboptimal interactions shows the concept of
using a seed will reduce the runtime. --noSeed
results in
the highest amount of sub-optimal interaction preditions Using a larger
seed reduces the amount of possible interactions, thus reducing the
search-space and the runtime of IntaRNA
. If the seed
becomes too large, no feasible interactions will be found anymore with
an energy value \(< 0\).
Observation 2:
We have seen in the lecture that in the biological process starts by forming a stable seed interaction which is then extended into a longer more stable interaction (e.g. sRNA or miRNA interaction). Therefore, using a seed can help to remove some of the biologically less likely sub-optimal interactions.
Observation 3:
Comparing interaction predictions resulting from using seeds of length \(2\) and \(4\) shows that some short interactions may not sum up to an energy value below \(0\), and are therefore not predicted. Thus we sometimes observe less sequences for very low seed lengths. The higher the initial seed length is set, the fewer interactions are predicted as fewer interactions are possible.