Extend the given Zuker recursions towards the folding of an alignment of K sequences similar to RNAalifold.
For the matrix \(W\),
Init: \(W_{ij} = 0\), with \(i+m\geq j\)
Recursion for entries \(W_{ij}\), with \(i+m < j\): \[ W_{ij} = \min \begin{cases} W_{i{j-1}}\\ W_{i+1{j}}\\ V_{ij}\\ \end{cases} \]
For the matrix \(V\):
Init: \(V_{ij} = \infty\), with \(i+m\geq j\)
Recursion for entries \(V_{ij}\), with \(i+m < j\): \[ V_{ij} = \min \begin{cases} eH(i,j)\\ V_{i+1{j-1}} + eS(i,j)\\ min_{\substack{i<i^{'}<j^{'}<j,\\i^{'}-i+j-j^{'}>2}}V_{i^{'}{j^{'}}}+eL(i,j,i^{'},j^{'})\\ \end{cases} \]
\[ V_{ij} = \beta \gamma (ij) + \min \begin{cases} \sum_{1 \leq l \leq K} eH(i,j, S_{l})\\ V_{i+1{j-1}} + \sum_{1 \leq l \leq K} eS(i,j, S_{l})\\ min_{\substack{i<i^{'}<j^{'}<j,\\i^{'}-i+j-j^{'}>2}}V_{i^{'}{j^{'}}}+ \sum_{1 \leq l \leq K} eL(i,j,i^{'},j^{'}, S_{l})\\ \end{cases} \]
Consider the following alignment that is used as input for RNAalifold. No minimal loop length is given.
AAGUUUCG
AAGCUUCG
AAC-AUG-
Compute the following conservation scores.
\[ \begin{align*} \gamma(1,6) &= \\ \gamma(2,5) &= \\ \gamma(3,7) &= \\ \gamma(4,8) &= \\ \gamma(7,8) &= \end{align*} \]
\[ \begin{align*} h(x,y)= \begin{cases} 1 & x \neq y\\ \\ 0 & x=y\\ \end{cases} \end{align*} \]
\[ \begin{align*} \gamma(i,j)= & \;-\hspace{-6pt}\sum_{1\leq\ell<\ell'\leq K} \begin{cases} h(a_{\ell i},a_{\ell' i}) + h(a_{\ell j},a_{\ell' j}) & \text{$a_{\ell i} - a_{\ell j}$, $a_{\ell' i} - a_{\ell' j}$ compl.}\\ 0 & \text{otherwise}, \end{cases}\\ & + \delta \sum_{1\leq\ell\leq K} \begin{cases} 0 & \text{$a_{\ell i} - a_{\ell j}$ complementary}\\ 0.25 & \text{$a_{\ell i}, a_{\ell j}$ are both gaps}\\ \\ 1 & \text{otherwise}, \end{cases} \end{align*} \]
\[ \begin{align*} \gamma(1,6)&= -[0+0+0] + \delta(0+0+0) = 0\\ \gamma(2,5)&= -[0+0+0] + \delta(0+0+1) = \delta\\ \gamma(3,7)&= -[0+2+2] + \delta(0+0+0) = -4\\ \gamma(4,8)&= -[1+0+0] + \delta(0+0+0.25) = -1+0.25\delta\\ \gamma(7,8)&= -[0+0+0] + \delta(0+0+1) = \delta \end{align*} \]
Why is \(\gamma(3,7) < \gamma(1,6)\)?
\(\gamma(3,7)\) is smaller because of the covariation term. The base pair GC has been mutated into the base pair CG. This is a change in the sequence but not in the secondary structure.
What is the intuition of the covariation and penalty terms in the conservation scores?
The intuition for the covariation term is to favour structures that have been maintained although the sequence has been mutated. The penalty term favours base pairs since it penalizes gaps and unpaired bases.