You are given the text T=CAGTAGTAGC
.
Draw the corresponding suffix tree!
Describe the steps of a counting query for \(P =\) TAG
.
Describe the steps of a reporting query for \(P =\) AG
.
Draw a generalized suffix tree for the sequences \(A=\)CCATG
and \(B=\) CATG
.
Concatenate the two sequences using a unique character for splitting.
e.g. CCATG#CATG$
.
Dont forget to include suffix links!
\(sl(v) = w\)
\(\overline{v} = cb\)
\(\overline{w} = b\)
\(c: character, b: string\)
remember: \(\overline{v}\) denotes the concatenation of all path labels from the root to v.
Find the Maximal Unique Matches of the sequences \(A=\)CCATG
and \(B=\)CATG
using the tree from
A).
CATG
is the only MUM as \(\overline{v} =\) CATG
has no
suffix links pointing to it
Draw a generalized suffix tree for the sequence \(A=\)ACGCACGCG
.
Find all maximal pairs of length at least 2.
ACGC
: \((1,5,4)\)
CG
: \((2,8,2),
(6,8,2)\)
Why is C
: \((2, 8, 1)\)
not a maximal pair?
It is not right maximal. This can be seen since CG
:
\((2, 8, 2)\) already includes the
indices 2 and 8 with a longer match.