Exercise 1 - Smith-Waterman

Consider the following sequences \(S1\) and \(S2\), a similarity scoring via \(s(x,y)\) and a gap cost function \(g(k)\).

\[\begin{align} S1 &= TCCGA\\ S2 &= TACGCAGA\\ s(x, y) &= \begin{cases} +1 &if\ x=y\\ 0 &else \end{cases}\\ g(k) &= -k \end{align}\]

1a)

Compute the local alignment matrix \(S_{i,j}\) for the given sequence.

Hide
Matrix

Si,j

-

T

A

C

G

C

A

G

A

-

0

0

0

0

0

0

0

0

0

T

0

C

0

C

0

G

0

A

0

Hint: Formulae

\[\begin{align} H_{0,0} &= 0\\ H_{i,0} &= 0\\ H_{0,j} &= 0\\ H_{i,j} &= max \begin{cases} H_{i-1, j-1} + s(a_i, b_j)\\ H_{i-1,j} + s(a_i, -)\\ H_{i, j-1} + s(-, b_j)\\ 0 \end{cases} \end{align}\]

Solution

Si,j

-

T

A

C

G

C

A

G

A

-

0

0

0

0

0

0

0

0

0

T

0

1

0

0

0

0

0

0

0

C

0

0

1

1

0

1

0

0

0

C

0

0

0

2

1

1

1

0

0

G

0

0

0

1

3

2

1

2

1

A

0

0

1

0

2

3

3

2

3

1b)

Give all optimal local alignments and the according score.

Hide
Solution

score: 3

TCCG    TCCGA   TCCG-A    CCGA
|:||    |:||:   |:|| |    |:||
TACG    TACGC   TACGCA    CAGA

Exercise 2 - Affine gap costs

2a)

Which of these statements are correct?

Statements
Solution

2b)

You want to extend the Smith-Waterman algorithm for local alignment to more general gap scoring functions \(g(k)\) (where \(k\) denotes the gap length).

The following recursions were created analogously to the Waterman-Smith-Beyer algorithm. Which of these (if any) represents a variant of the Smith-Waterman algorithm that allows for an arbitrary gap scoring function?

\[ D_{0,0} = 0,\ D_{i,0}=0,\ D_{0,j}=0 \]

\[\begin{align} D_{i,j} &\stackrel{(i)}{=} max \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ max_{1 \leq k \leq j} D_{i,j-k} + g(1)\\ max_{1 \leq k \leq i} D_{i-k,j} + g(1)\\ 0 \end{cases} &D_{i,j} &\stackrel{(ii)}{=} max \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ max_{1 \leq k \leq j} D_{i,j-k} + g(k)\\ max_{1 \leq k \leq i} D_{i-k,j} + g(k)\\ 0 \end{cases}\\ \\ D_{i,j} &\stackrel{(iii)}{=} min \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ max_{1 \leq k \leq j} D_{i,j-k} + g(k)\\ max_{1 \leq k \leq i} D_{i-k,j} + g(k)\\ 0 \end{cases} &D_{i,j} &\stackrel{(iv)}{=} min \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ max_{1 \leq k \leq j} D_{i,j-k} + g(1)\\ max_{1 \leq k \leq i} D_{i-k,j} + g(1)\\ 0 \end{cases}\\ \\ D_{i,j} &\stackrel{(v)}{=} max \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ max_{1 \leq k \leq j} D_{i,j-k} + g(1)\\ max_{1 \leq k \leq i} D_{i-k,j} + g(1)\\ -1 \end{cases} \end{align}\]

Hide
Solution

Only Formula \(ii\) allows for an arbitrary gap scoring.

Note

Distance scores are not suited for local alignment scoring, thus all minimization approaches are not suitable. Further \(g(1)\) does not represent an affine scoring.

2c)

The following recursions were created analogously to the Gotoh algorithm. Which of these (if any) represents a variant of the Smith-Waterman algorithm that allows for an affine gap scoring function?

\[ D_{0,0} = 0,\ D_{i,0}=0,\ D_{0,j}=0 \] \[\begin{align} D_{i,j} &\stackrel{(i)}{=} max \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ P_{i,j}\\ Q_{i,j}\\ 0 \end{cases} \begin{aligned} P_{i,j} &= max \begin{cases} D_{i-1,j} + g(1)\\ P_{i-1, j} + \beta \end{cases}\\ Q_{i,j} &= max \begin{cases} D_{i,j-1} + g(1)\\ Q_{i, j-1} + \beta \end{cases}\\ \end{aligned}\\ \\ D_{i,j} &\stackrel{(ii)}{=} max \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ P_{i,j}\\ Q_{i,j}\\ 0 \end{cases} \begin{aligned} P_{i,j} &= max \begin{cases} D_{i-1,j} + g(k)\\ P_{i-1, j} + \beta \end{cases}\\ Q_{i,j} &= max \begin{cases} D_{i,j-1} + g(k)\\ Q_{i, j-1} + \beta \end{cases}\\ \end{aligned}\\ \\ D_{i,j} &\stackrel{(iii)}{=} min \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ P_{i,j}\\ Q_{i,j}\\ 0 \end{cases} \begin{aligned} P_{i,j} &= min \begin{cases} D_{i-1,j} + g(k)\\ P_{i-1, j} + \beta \end{cases}\\ Q_{i,j} &= min \begin{cases} D_{i,j-1} + g(k)\\ Q_{i, j-1} + \beta \end{cases}\\ \end{aligned}\\ \\ D_{i,j} &\stackrel{(iv)}{=} min \begin{cases} D_{i-1, j-1} + s(a_i, b_j)\\ P_{i,j}\\ Q_{i,j}\\ 0 \end{cases} \begin{aligned} P_{i,j} &= min \begin{cases} D_{i-1,j} + g(1)\\ P_{i-1, j} + \beta \end{cases}\\ Q_{i,j} &= min \begin{cases} D_{i,j-1} + g(1)\\ Q_{i, j-1} + \beta \end{cases}\\ \end{aligned} \end{align}\]

Hide
Solution

Only Formula \(i\) allows for an affine gap scoring.

Note

Distance scores are not suited for local alignment scoring, thus all minimization approaches are not suitable. Therefore \(iii\) and \(iv\) don’t represent the recursion.

According to the algorithm we only need to put gap introduction score into the \(D\) matrix. Thus \(ii\) is also wrong.


Exercise 3 - Programming assignment

Programming assignments are available via Github Classroom and contain automatic tests.

We recommend doing these assignments since they will help you to further understand this topic.

Access the Github Classroom link: Programming Assignment: Sheet 05.


Preparing for the Exam

Below you find an interactive tool that allows you to create dynamic programming matrices to practice for the exam. You can fill in the table and check which values are correct via the Check Matrix button. Correctly filled cells will turn green.

Once you filled in the cells, you can trigger the traceback mode toggle. This will disable the input fields. However, you can now select cells that are part of a traceback. While in traceback mode you can click the Check Traceback & Alignment button. If you completely marked a correct traceback it will get a green border. Else marked cells will get a red border.

It will further check whether the alignment in the alignment section below matches your traceback (only if your traceback is correct). If this is the case, the alignment will turn green else it will turn red.

Warning

This tool is in beta and might contain bugs.






Alignment