You are given the sequences \(a\), \(b\) and \(c\)
\[ a = CACCGG\\ b = ACCAAG\\ c = AACACC\\ \]
The pairwise optimal alignments \(A(x, y)\) of the set of sequences \(S\) were calculated as:
a: CACCG_G a: __CACCGG b: ACCAAG
|||: | |||| |:||::
b: _ACCAAG c: AACACC__ c: AACACC
Calculate the primary library (\(L\))
init: \(L^{x,y}_{i,j}=0\)
\(\forall\) alignments \(A\) of sequences \(x\) and \(y\) of the set \(S\).
\(weight(A) = \dfrac{number\ of\ matches}{min(len(x), len(y))} * 100\)
\(\forall\) aligned positions \(i,j\) with \(1 \leq i \leq len(x)\) and \(1 \leq j \leq len(y)\)
\(L^{x,y}_{i,j} = L^{x,y}_{i,j} + weight(A)\)
\(L^{a,b}_{2,1}=L^{a,b}_{3,2}=L^{a,b}_{4,3}=L^{a,b}_{6,6}=100*\frac{4}{6} = 66.6\bar{6} \approx 67\) and all other \(L^{a,b}_{i,j}=0\)
\(L^{a,c}_{1,3}=L^{a,c}_{2,4}=L^{a,c}_{3,5}=L^{a,c}_{4,6}=100*\frac{4}{6} = 66.6\bar{6} \approx 67\) and all other \(L^{a,c}_{i,j}=0\)
\(L^{b,c}_{1,1}=L^{b,c}_{3,3}=L^{b,c}_{4,4}=100*\frac{3}{6}=50\) and all other \(L^{b,c}_{i,j}=0\)
Calculate the extended library (\(EL\))
\(EL^{x,y}_{i,j}= L^{x,y}_{i,j} + \sum_{z \in S \setminus \{\ x, y\}\ } \sum_{1 \leq k \leq len(z)} min(L^{x,z}_{i,k}, L^{z,y}_{k,j})\)
The original Library doesn’t change as there are no edges enforcing certain connections. Hence
\(EL^{x,y}_{i,j}= L^{x,y}_{i,j} \qquad \forall L^{x,y}_{i,j}\neq 0\)
and the following weights are added:
a: __CACCGG
||||
c: AACACC__
|:||::
b: ACCAAG
**
\(EL^{a,b}_{1,3}=EL^{a,b}_{2,4}=50\)
a: CACCG_G
|||: |
b: _ACCAAG
|:||::
c: AACACC
* *
\(EL^{a,c}_{2,1}=EL^{a,c}_{4,3}=50\)
b: ACCAAG
|||: |
a: CACCG_G
||||
c: AACACC
***
\(EL^{b,c}_{1,4}=EL^{b,c}_{2,5}=EL^{b,c}_{3,6}=67\)
Realign the sequences \(b\) and \(c\) using \(EL\) for scoring and gap costs and mismatch costs of 0
\[\begin{eqnarray*} i &\in& [1,|x|] \\ j &\in& [1,|y|] \\ L(0,0) &=& 0 \\ L(i,0) &=& w(x_i,-)*i \quad \text{ or } = L(i-1,0)+w(x_i,-)\\ L(0,j) &=& w(-,y_j)*j \quad \text{ or } = L(0,j-1)+w(-,y_j)\\ L(i,j) &=& max \begin{cases} L(i-1,j)+w(x_i,-) \\ L(i,j-1)+w(-,y_j) \\ L(i-1,j-1)+w(x_i,y_j) \end{cases}\\ w(x_i,y_j) &=& \begin{cases} EL^{x, y}_{i, j} & \text{match-score if } (x_i=y_j)\\ 0 & \text{insert/deletion-score if } (x_i = - \lor y_j =-)\\ 0 & \text{mismatch-score else } (x_i\neq y_j) \\ \end{cases} \\ \end{eqnarray*}\]
- | - | A | C | C | A | A | G |
---|---|---|---|---|---|---|---|
- | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
A | 0 | 50 | 50 | 50 | 50 | 50 | 50 |
A | 0 | 50 | 50 | 50 | 50 | 50 | 50 |
C | 0 | 50 | 50 | 100 | 100 | 100 | 100 |
A | 0 | 67 | 67 | 100 | 150 | 150 | 150 |
C | 0 | 67 | 133 | 133 | 150 | 150 | 150 |
C | 0 | 67 | 133 | 200 | 200 | 200 | 200 |
Alignment:
b: ___ACCAAG
|||
c: AACACC___
Do the other alignments \(a\)-\(b\) and \(a\)-\(c\) change? Provide arguments, without calculating new alignments.
No. The newly added alignment scores in \(EL\) represent edges that are incompatible with the current best alignments and can not score higher.
Sketch a Guide Tree (either Sketch or Newick format)
Newick: \(((a,c),b)\) or \(((a,b),c)\) or \(((b,c),a)\)
Perform a progressive alignment by aligning sequence \(b\) to the already existing alignment \(A(a, c)\). To score a match between \(b\) and \(A(a, c)\) use the sum \(EL^{a,b} + EL^{b,c}\) with the correct indices. Show the resulting multiple sequence alignment.
- | - | - | - | C | A | C | C | G | G |
---|---|---|---|---|---|---|---|---|---|
- | - | A | A | C | A | C | C | - | - |
- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
A | 0 | 50 | 50 | 50 | 133 | 133 | 133 | 133 | 133 |
C | 0 | 50 | 50 | 50 | 133 | 267 | 267 | 267 | 267 |
C | 0 | 50 | 50 | 150 | 150 | 267 | 400 | 400 | 400 |
A | 0 | 50 | 50 | 150 | 250 | 267 | 400 | 400 | 400 |
A | 0 | 50 | 50 | 150 | 250 | 267 | 400 | 400 | 400 |
G | 0 | 50 | 50 | 150 | 250 | 267 | 400 | 400 | 467 |
One Possible Alignment:
a: __CACC_GG
||||
c: AACACC___
|||
b: ___ACCAAG