CSC277 - Project 3
Strings and Motifs

Assigned Feb 10th 11 a.m.
Due Feb 19th 9:45 a.m.


Part 1 - String Matching Algorithms

Using the Knuth-Morris-Pratt Exact String Matching algorithm, what is the prefix array for the pattern GARGANTUAN? Show the steps used to find this pattern in the string PANGALACTICGARGLEGARGANTUANBLASTER.

Part 2 - Brute Force Median String Algorithm

Write a program (in either Java or Python) to implement the BruteForceMedianSearch algorithm on page 112 of the textbook. You must also implement the TotalDistance function. You need to augment this algorithm to also save the starting positions of the found motif. Your output should be in the form of a profile matrix and the resulting consensus string.

Part 3 - Greedy Motif Algorithm

Write a program (in either Java or Python) to implement the GreedyMotifSearch algorithm on page 136 of the textbook. Your output should be in the form of a profile matrix and the resulting consensus string.

Testing

Test you above two algorithm on the 15 sequences found in the sample DNA file provided, searching for a motif of length 8, and report your output. Was there a large difference in the consensus string found?

Turn in your code, profile and consensus output, and writeup for part 1 in your csc277 directory on the cs.centenary.edu server.