Exercise 1

You are given the sequences \(a\), \(b\) and \(c\)

\[ a = CACCGG\\ b = ACCAAG\\ c = AACACC\\ \]

The pairwise optimal alignments \(A(x, y)\) of the set of sequences \(S\) were calculated as:

a: CACCG_G          a: __CACCGG     b: ACCAAG
    |||: |               ||||          |:||::
b: _ACCAAG          c: AACACC__     c: AACACC

1a)

Calculate the primary library (\(L\))

Hide
Formulae

init: \(L^{x,y}_{i,j}=0\)

\(\forall\) alignments \(A\) of sequences \(x\) and \(y\) of the set \(S\).

\(weight(A) = \dfrac{number\ of\ matches}{min(len(x), len(y))} * 100\)

\(\forall\) aligned positions \(i,j\) with \(1 \leq i \leq len(x)\) and \(1 \leq j \leq len(y)\)

\(L^{x,y}_{i,j} = L^{x,y}_{i,j} + weight(A)\)

Solution

\(L^{a,b}_{2,1}=L^{a,b}_{3,2}=L^{a,b}_{4,3}=L^{a,b}_{6,6}=100*\frac{4}{6} = 66.6\bar{6} \approx 67\) and all other \(L^{a,b}_{i,j}=0\)

\(L^{a,c}_{1,3}=L^{a,c}_{2,4}=L^{a,c}_{3,5}=L^{a,c}_{4,6}=100*\frac{4}{6} = 66.6\bar{6} \approx 67\) and all other \(L^{a,c}_{i,j}=0\)

\(L^{b,c}_{1,1}=L^{b,c}_{3,3}=L^{b,c}_{4,4}=100*\frac{3}{6}=50\) and all other \(L^{b,c}_{i,j}=0\)

1b)

Calculate the extended library (\(EL\))

Hide
Formulae

\(EL^{x,y}_{i,j}= L^{x,y}_{i,j} + \sum_{z \in S \setminus \{\ x, y\}\ } \sum_{1 \leq k \leq len(z)} min(L^{x,z}_{i,k}, L^{z,y}_{k,j})\)

Solution

The original Library doesn’t change as there are no edges enforcing certain connections. Hence

\(EL^{x,y}_{i,j}= L^{x,y}_{i,j} \qquad \forall L^{x,y}_{i,j}\neq 0\)

and the following weights are added:

a: __CACCGG     
     ||||       
c: AACACC__     
   |:||::
b: ACCAAG
     **

\(EL^{a,b}_{1,3}=EL^{a,b}_{2,4}=50\)

a: CACCG_G     
    |||: |    
b: _ACCAAG   
    |:||::
c:  AACACC
    * *

\(EL^{a,c}_{2,1}=EL^{a,c}_{4,3}=50\)

b:    ACCAAG
      |||: |    
a:   CACCG_G     
     |||| 
c: AACACC
      ***

\(EL^{b,c}_{1,4}=EL^{b,c}_{2,5}=EL^{b,c}_{3,6}=67\)

1c)

Realign the sequences \(b\) and \(c\) using \(EL\) for scoring and gap costs and mismatch costs of 0

Hide
Formulae

\[\begin{eqnarray*} i &\in& [1,|x|] \\ j &\in& [1,|y|] \\ L(0,0) &=& 0 \\ L(i,0) &=& w(x_i,-)*i \quad \text{ or } = L(i-1,0)+w(x_i,-)\\ L(0,j) &=& w(-,y_j)*j \quad \text{ or } = L(0,j-1)+w(-,y_j)\\ L(i,j) &=& max \begin{cases} L(i-1,j)+w(x_i,-) \\ L(i,j-1)+w(-,y_j) \\ L(i-1,j-1)+w(x_i,y_j) \end{cases}\\ w(x_i,y_j) &=& \begin{cases} EL^{x, y}_{i, j} & \text{match-score if } (x_i=y_j)\\ 0 & \text{insert/deletion-score if } (x_i = - \lor y_j =-)\\ 0 & \text{mismatch-score else } (x_i\neq y_j) \\ \end{cases} \\ \end{eqnarray*}\]

Solution

-

-

A

C

C

A

A

G

-

0

0

0

0

0

0

0

A

0

50

50

50

50

50

50

A

0

50

50

50

50

50

50

C

0

50

50

100

100

100

100

A

0

67

67

100

150

150

150

C

0

67

133

133

150

150

150

C

0

67

133

200

200

200

200

Alignment:

b: ___ACCAAG
      |||
c: AACACC___

1d)

Do the other alignments \(a\)-\(b\) and \(a\)-\(c\) change? Provide arguments, without calculating new alignments.

Hide
Solution

No. The newly added alignment scores in \(EL\) represent edges that are incompatible with the current best alignments and can not score higher.

1e)

Sketch a Guide Tree (either Sketch or Newick format)

Hide
Solution

Newick: \(((a,c),b)\) or \(((a,b),c)\) or \(((b,c),a)\)

1f)

Perform a progressive alignment by aligning sequence \(b\) to the already existing alignment \(A(a, c)\). To score a match between \(b\) and \(A(a, c)\) use the sum \(EL^{a,b} + EL^{b,c}\) with the correct indices. Show the resulting multiple sequence alignment.

Hide
Solution

-

-

-

-

C

A

C

C

G

G

-

-

A

A

C

A

C

C

-

-

-

0

0

0

0

0

0

0

0

0

A

0

50

50

50

133

133

133

133

133

C

0

50

50

50

133

267

267

267

267

C

0

50

50

150

150

267

400

400

400

A

0

50

50

150

250

267

400

400

400

A

0

50

50

150

250

267

400

400

400

G

0

50

50

150

250

267

400

400

467

One Possible Alignment:

a: __CACC_GG
     ||||
c: AACACC___
      |||
b: ___ACCAAG