A phylogenetic tree (also phylogeny or evolutionary tree 1) is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry.

https://en.wikipedia.org/wiki/Phylogenetic_tree

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a simple agglomerative or hierarchical clustering method used in bioinformatics for the creation of phylogenetic trees. UPGMA assumes a constant rate of evolution (molecular clock hypothesis), and is not a well-regarded method for inferring phylogenetic trees unless this assumption has been tested and justified for the data set being used.

https://en.wikipedia.org/wiki/UPGMA

Exercise 1 - WPGMA

Note

Distances for a merged cluster \(e\), where \(e = c \cup d\):

\[ WPGMA: dist(x, e) = \dfrac{dist(x,c) + dist(x,d)}{2} \]

In the following steps we calculate the evolutionary tree using WPGMA and the pairwise distances in the following distance matrix.

Di,j

a

b

c

d

e

a

0

3

12

12

9

b

0

13

13

10

c

0

6

7

d

0

7

e

0

1a)

Which leaves should be selected first?

Hide
Hint
Solution

1b)

Calculate the corresponding distance for the set of leaves from a).

Hide
Solution

1.5

1c)

Fill in the distance matrix with the correct distances form the set of leaves (aka. internal node) from a) to all other leaves.

Hide
Solution

Di,j

{ab}

c

d

e

{a,b}

0

12.5

12.5

9.5

c

0.0

6.0

7.0

d

0.0

7.0

e

0.0

1d)

Which nodes are joined next given the correct distance matrix from c)?

Hide
Hint
Solution

1e)

Fill in a distance matrix with the remaining nodes and leaves.

Hide
Solution

Di,j

{ab}

{c,d}

e

{a,b}

0

12.5

9.5

{c,d}

0.0

7.0

e

0.0

1f)

What does the subpart of the tree look like in Newick format after selecting and joining your answer from e)

Note

The following answers will be given in Newick format. Feel free to inspect them using an online tool.

Hide
Hint
Solution

1g)

Following the approach from the previous exercises, what does the whole tree look like.

Hide
Hint
Solution

Exercise 2 - UPGMA

2a)

Imagine using UPGMA instead of WPGMA for construction of a tree. Which of the following statements is True?

Statements
Hint: Formula

\[ UPGMA: dist(x, e) = \dfrac{|c|dist(x,c) + |d|dist(x,d)}{|c| + |d|} \]

Solution

Exercise 3 - Ultrametric

3a)

Which of the following distance matrices are ultrametric?

Di,j

a

b

c

d

e

a

0

3

12

12

9

b

0

13

13

10

c

0

6

7

d

0

7

e

0

Di,j

a

b

c

d

e

a

0

2

4

6

8

b

0

4

6

8

c

0

6

8

d

0

8

e

0

Di,j

a

b

c

d

e

a

0

10

17

16

16

b

0

15

14

14

c

0

9

15

d

0

14

e

0

Di,j

a

b

c

d

e

a

0

2

2

2

2

b

0

4

4

4

c

0

6

6

d

0

8

e

0

Hide
Hint

Note

Definition Ultra-Metric:

\[\begin{align} w(x,y) &= 0 \leftrightarrow x = y\ &\text{(identity)}\\ w(x, y) &= w(y, x)\ &\text{(symmetric)}\\ w (x, z) &\leq w (x, y ) + w (y , z) &\text{(triangle inequality)}\\ w (x, z) &\leq max\{ w (x, y ), w (y , z)\} &\text{(strong triangle inequality)} \end{align}\]

Solution

2)


Exercise 4 - Programming assignment

Programming assignments are available via Github Classroom and contain automatic tests.

We recommend doing these assignments since they will help you to further understand this topic.

Access the Github Classroom link: Programming Assignment: Sheet 10.


  1. Felsenstein, Joseph, and Joseph Felenstein. Inferring phylogenies. Vol. 2. Sunderland, MA: Sinauer associates, 2004.↩︎