Exercise sheet 2: Zuker’s algorithm

Question 1

What is represented by \(m\) in the above recursion equations?

Answer

m is part of the condition for the initialization. Do you know of a case where the initialization was extended?

Answer

The minimal loop length (i.e. number of unpaired bases in loop).

Question 2

Can we reduce the initialization of \(W\)?

Answer

There were two ways to include a minimal loop length as a condition in your DP-folding algorithm.

Answer

Yes, it is sufficient to initialize \(W_{ii} = 0\).

Question 3

What is computed in \(W_{1n}\)?

Answer

You would start your traceback in this cell.

Answer

The entry \(W_{1n}\) contains the minimum free energy of all secondary structures containing hairpin, internal loops and stackings. But only for structures that have one external base pair, as the \(W\) matrix has no decomposition case. (the W = W + V case is missing)

Question 4

What is the complexity for computing \(W_{1n}\) by dynamic programming?

Answer

Which is the case with the highest amount of indices that need to be checked?

Answer

The complexity is \(O(n^{4})\). For each of the \(O(n^{2})\) matrix entries the minimum of \(O(n^{2})\) internal loops has to be selected.

Question 5

Why is the time complexity of Zuker only \(O(n^{3})\)? How can you modify this recursion accordingly? Will the algorithm still solve exactly the same problem? How does the time complexity change?

Answer

In this case, we do not have a multiloop case.

Answer

The Zuker algorithm is restricted to secondary structures containing only internal loops up to a maximum size of L. That way for each matrix entry only \(O(n)\) internal loops have to be checked, leading to a reduced complexity of \(O(n^{3})\). The same modification can be done for this algorithm. By restricting the maximum size of internal loops the modified algorithm is working on a subset of the original structure ensemble, so the problem differs slightly. Time complexity goes down from \(O(n^4)\) to \(O(n^2)\).

Question 6

Modified versions of this simplified Zuker recursion are used in practice to search entire genomes (\(n>10^6\)) for stable hairpin loop structures. How can we further adapt the recursion in order to improve the runtime when looking for short hairpin loops in an entire genome?

Answer

As we are only interested in short, stable hairpin loops, we can restrict the hairpin size.

This means that we can constraint the distance between \(i\) and \(j\). In other words, we can consider only a window from \(i\) up to a constant length, which will make \(j\) a constant value. Thus \(j\) will not be considered in the time complexity calculation. The space complexity is reduced to \(O(n * \text{constant})\) for our \(W\) and \(V\) matrices. (As we only need to consider a constant number of columns for each row of our matrix.)

This reduces the overall time complexity to \(O(n)\)

Question 7

How does the recursion differ from the final \(V\)-recursion given in the lecture? Identify structures that can be predicted by the Zuker’s algorithm but not by this algorithm.

Answer

Consider the final case of the recursion, which is added in addition to the recursion cases from Exercise 1.

Answer

In this recursion the computation of the multi-loop parts is not yet done in a separate recursion an the general multiloop energy model is used instead of the simplified one. The new algorithm is able to predict structures including stacking, hairpin loops, internal loops and multiloops having exactly \(2\) inner base pairs within loop. For example it won’t be possible to predict a structure having a multiloop with \(3\) inner base pairs using this new algorithm.

Question 8

What is the time and space complexity of this algorithm?

Answer

Check how many additional indices need to be checked.

Answer

Space: \(O(n^{2})\). Time: \(O(n^{6})\), for each entry in matrix \(V O(n^{4})\) different multiloops have to be checked.

Question 9

Can you modify the \(V\)-recursion again to support 3-multiloops with an energy contribution of \(eM(i,j,i_{1},j_{1},i_{2},j_{2},i_{3},j_{3})\)? How does this generalize to \(k\)-multiloops for arbitrary \(k\) and how does it correlate with time complexity? Why does the Zuker algorithm handle \(k\)-multiloops by introducing a further matrix \(WM\) (and only approximate the multiloop energies)?

Answer

A modified \(V\)-recursion that supports 3-multiloops is:

\[ V_{ij} = \min \begin{cases} eH(i,j)\\ V_{i+1{j-1}} + eS(i,j)\\ min_{i<i^{'}<j^{'}<j}V_{i^{'}{j^{'}}}+eL(i,j,i^{'},j^{'})\\ min_{i<i_{1}<j_{1}<i_{2}<j_{2}<i_{3}<j_{3}<j}V_{i_{1}{j_{1}}}+V_{i_{2}{j_{2}}}+V_{i_{3}{j_{3}}}+eM(i,j,i_{1},j_{1},i_{2},j_{2},i_{3},j_{3})\\ \end{cases} \]

Thus, for each additional branch (enclosed helix), time complexity increases by \(O(n^2)\), i.e. we get \(O(n^2)\cdot O(n^{2k})\) overall time complexity for \(V\) when considering multi loops with up to \(k\) helices. By introducing the matrix \(WM\) and only approximate the multiloop energies, Zuker can reduce its total runtime to \(O(n^3)\).

Question 10

Study the energy contributions of the individual loops. Which loops have typically positive energy contributions?

Answer

Typically in unpaired regions (Hairpin loop, Internal loop, and Multi loop)

Question 11

What is changing in the loop energy contribution if the temperature is stepwise increased (e.g. up to 70\(^\circ\) Celsius)?

Answer

A temperature increase also increases the energy contributions. e.g.:

temp 37: Interior loop (  1, 73) GC; (  2, 72) GC:  -330
temp 50: Interior loop (  1, 73) GC; (  2, 72) GC:  -287
temp 70: Interior loop (  1, 73) GC; (  2, 72) GC:  -222

Exercise sheet 2: Zuker’s algorithm

Exercise 1

1.1

Hide

Hint

Solution

1.2

Hide

Hint

Solution

1.3

Hide

Hint

Solution

1.4

Hide

Hint

Solution

1.5

Hide

Hint

Solution

1.6

Hide

Solution

Exercise 2

2.1

Hide

Hint

Solution

2.2

Hide

Hint

Solution

2.3

Hide

Solution

Exercise 3 - Loop decomposition and energy evaluation

3.1

Hide

Solution

3.2

Hide

Solution