Exercise 1 - Point Accepted Mutation (PAM)

We want to calculate the \(PAM_1\) matrix based on the following two sequence alignments of the DNA sequences a, b, c and d.

a = AAGTACTTT   c = AGGTAACACGTTTAGTCA
b = AAATTCCTA   d = AGTTTACCGGGTTAATCA

Tip: In order to solve a) and b) create a combined alignment comprised of two combined sequences a’ and b’ (based on the two initial alignments and their symmetric counterparts)

a’ = a + c + b + d

b’ = b + d + a + c

The order does not matter, as the frequency identification is position-insensitive.

Unless otherwise stated round all results to 4 decimal places.

1a)

Calculate the nucleotide frequencies \(r_x\)

Hide
Hint : Formulae

\[ r_x = \frac{\#\{x \in (a')\}}{\lvert a' \rvert} \]

Solution

\[\begin{align} r_A = 0.3333 \\ r_C = 0.1667 \\ r_T = 0.3333 \\ r_G = 0.1667 \\ \end{align}\]

1b)

Calculate the symmetric mutation matrix \(E(x,y)\).

Hide
Hint : Formulae

\[\begin{align} e_{xy} = \frac{1}{\lvert a' \rvert} \left( \# \begin{vmatrix}x \\ y \end{vmatrix} \in \begin{pmatrix} a' \\ b' \end{pmatrix} \right) \end{align}\]

Intermediate Values

Non normalized values. Further multiplied by \(|a'|\)

nt.

A

C

T

G

A

12

1

3

2

C

-

6

1

1

T

-

-

12

2

G

-

-

-

4

Solution

nt.

A

C

T

G

A

0.2222

0.0185

0.0556

0.0370

C

-

0.1111

0.0185

0.0185

T

-

-

0.2222

0.0370

G

-

-

-

0.0741

1c)

Calculate the non-normalized PAM matrix S with \(10*log_{10} (odds)\), using the previously determined \(r\) values and \(E\) matrix. (round to integers)

Hide
Hint : Formulae

\[\begin{align} S_{xy} = 10 \log_{10} \left( \frac{e_{xy}}{r_x r_y} \right) \end{align}\]

Solution

nt.

A

C

T

G

A

3

-5

-3

-2

C

-

6

-5

-2

T

-

-

3

-2

G

-

-

-

4

1d)

Given the sequences \(a = ACC\) and \(b = ATT\), compute the optimal Needleman-Wunsch alignments using:

  1. The general similarity scoring function.

\[ s^{g}(x, y) = \begin{cases} 5 & \text{if } x = y \\ -2 & \text{if } x \neq y \text{ or gapped} \end{cases} \]

  1. The PAM1-based similarity scoring function.

\[ s^{PAM}(x, y) = \begin{cases} -2 & \text{if } x \text{ or } y \text{ gapped} \\ s_{x,y} & \text{otherwise (match/mismatch)} \end{cases} \]

Hide
Hint: Matrices

Sgi,j

-

A

T

T

-

0

-2

-4

-6

A

-2

5

3

1

C

-4

3

3

1

C

-6

1

1

1

SPAMi,j

-

A

T

T

-

0

-2

-4

-6

A

-2

3

1

-1

C

-4

1

-1

-3

C

-6

-1

-3

-5

Solution

general scoring:

ACC
ATT

PAM scoring:

ACC--
A--TT

1e)

Calculate the normalization factor \(\gamma\) based on \(E\) .

Hide
Hint: Formulae

\[ 0.01 = \gamma \sum_{x \neq y} e_{xy} = \gamma \left(1 - \sum_{x} e_{xx}\right) \]

Solution

\[ \gamma = 0.027 \]

1f)

Calculate the mutation rate matrix \(P\).

Hide
Hint: Formulae

\[ p_{xy} = \frac{e_{xy}}{r_x} \]

Solution

nt.

A

C

T

G

A

0.6667

0.0555

0.1668

0.1110

C

0.1110

0.6665

0.1110

0.1110

T

0.1668

0.0555

0.6667

0.1110

G

0.2220

0.1110

0.2220

0.4445

1g)

Calculate the normalized mutation rate matrix \(P'\) using \(P\) and the normalization factor \(\gamma\).

Hide
Hint: Formulae

\[\begin{align*} p'_{xy} &= \gamma p_{xy} \\ p'_{xx} &= 1 - \sum_{x \neq y} p'_{xy} \end{align*}\]

Solution

nt.

A

C

T

G

A

0.9910

0.0015

0.0045

0.003

C

0.0030

0.9910

0.0030

0.003

T

0.0045

0.0015

0.9910

0.003

G

0.0060

0.0030

0.0060

0.985

1h)

Determine \(PAM_1\) based on the normalized mutation rate matrix \(P'\) with \(10*log_{10}(odds)\) (round to integer)

Hide
Hint: Formulae

\[ PAM1_{xy} = 10 \log_{10} \left(\frac{p'_{xy}}{r_y}\right) \]

Solution

nt.

A

C

T

G

A

5

-20

-19

-17

C

-

8

-20

-17

T

-

-

5

-17

G

-

-

-

8

1i)

Determine \(PAM_2\). (round to integer)

Hide
Hint: Formulae

\[ PAM(m)_{xy} = 10 \log_{10} \left(\frac{p'_{xy}}{r_y}\right) \text{ with } p'_{xy} \in (P')^m \]

Hint: P2

\[ (P')^2 = \begin{bmatrix} 0.9910 & 0.0015 & 0.0045 & 0.003 \\ 0.0030 & 0.9910 & 0.0030 & 0.003 \\ 0.0045 & 0.0015 & 0.9910 & 0.003 \\ 0.0060 & 0.0030 & 0.0060 & 0.985 \\ \end{bmatrix} \times \begin{bmatrix} 0.9910 & 0.0015 & 0.0045 & 0.003 \\ 0.0030 & 0.9910 & 0.0030 & 0.003 \\ 0.0045 & 0.0015 & 0.9910 & 0.003 \\ 0.0060 & 0.0030 & 0.0060 & 0.985 \\ \end{bmatrix} = \begin{bmatrix} 0.98212375 & 0.00298875 & 0.0089415 & 0.005946 \\ 0.0059775 & 0.982099 & 0.0059775 & 0.005946 \\ 0.0089415 & 0.00298875 & 0.98212375 & 0.005946 \\ 0.011892 & 0.005946 & 0.011892 & 0.97027 \\ \end{bmatrix} \]

Solution

nt.

A

C

T

G

A

5

-17

-16

-14

C

-

8

-17

-14

T

-

-

5

-14

G

-

-

-

8

Exercise 2 - Programming assignment

Programming assignments are available via Github Classroom and contain automatic tests.

We recommend doing these assignments since they will help you to further understand this topic.

Access the Github Classroom link: Programming Assignment: Sheet 08.