You are given accession number NM_000667.3. Use the BLAST web server to
find out about the gene that belongs to this accession number (choose
nucleotide blast
, and the database
reference RNA sequences (refseq_rna)
).
Which gene is it, and in which organism?
Gene: Alcohol Dehydrogenase 1A
Organism: Homo sapiens (human)
Which other organisms does it seem to be highly conserved in?
Many more…
You are given a nucleotide query sequence \(q\) = \(\texttt{ATAC}\), and a nucleotide database sequence \(s\) = \(\texttt{ATAAAACGGGGGG}\). The word-size \(k=2\). Use a simple scoring scheme that assigns a score of \(2\) for a match and a score of \(-1\) for a mismatch.
Generate all \(k\)-length words of the query sequence.
List all possible words for the first \(k\)-length word (AT) that have a score of at least \(T_1=1\).
Scan the database for exact matches for the words from the question 2B.
AA at position 2,3,4. AC at position 5, AT at position 0.
Extend the exact matches that you found in the question 2C to the left/right and report all MSPs with a score greater than \(4\).
AA:
Pos: 2 ATA
|||
AAA with score 3
Pos: 3 ATAC
||||
AAAC with score 5
Pos: 4 AT
||
AA with score 1
AT:
Pos: 0 ATA
|||
ATA with score 6
AC:
Pos: 5 AT
||
AC with score 1
MSPs start in the template at index 0 and 3.
What happens if we vary the parameters k and \(T_1\)?
For the programming tasks, please follow the instructions given in GitHub Classroom under the following link.
https://classroom.github.com/a/HoQv5vpm