CSC277 - Project 3
Protein Alignment


Algorithm

Write a program to calculate the best global alignment between two sequences.

pwalign <sequence1> <sequence2> <score-matrix> <gap-penalty>

Your input sequences will be in FASTA format, and your scoring matricies will be either PAM250 or BLOSUM62.

Helpful Data Structures

The internal structure of your program could be organized in the following way:

These can be used by the following methods:

Output

The output of your program should be the resulting alignment, in a format similar to the output seen at the EMBOSS Align website. In general the markup line uses a space for a mismatch or a gap, '.' for any small positive score, ':' for a similarity which scores more than 1.0, and '|' for an identity where both sequences have the same residue regardless of its score.

GFP_AEQVI          1     MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKF     46
                         :.|       .:|....::|.||||.|..:|:|||:...|...:|.
NFCP_ANESU         1 MASFLKK-------TMPFKTTIEGTVNGHYFKCTGKGEGNPFEGTQEMKI     43

GFP_AEQVI         47 -ICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQE     95
                      :...|.||..:..|.|:..||.:.|.:|...:.  |:||.:.|||:..|
NFCP_ANESU        44 EVIEGGPLPFAFHILSTSCMYGSKTFIKYVSGIP--DYFKQSFPEGFTWE     91

GFP_AEQVI         96 RTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK-----    140
                     ||..::|.|......:...:||.||.::::.|.:|..||.::.:|     
NFCP_ANESU        92 RTTTYEDGGFLTAHQDTSLDGDCLVYKVKILGNNFPADGPVMQNKAGRWE    141

GFP_AEQVI        141 --LEYNYNSHNVY----IMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQ    184
                       .|..|....|.    :||.|...|       ||.    :..|...|:.
NFCP_ANESU       142 PATEIVYEVDGVLRGQSLMALKCPGG-------RHL----TCHLHTTYRS    180

GFP_AEQVI        185 NTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMD    234
                     ..|.  ..:.:|..|:              .||.:.:           |:
NFCP_ANESU       181 KKPA--SALKMPGFHF--------------EDHRIEI-----------ME    203

GFP_AEQVI        235 ELYK                             238
                     |:.|                         
NFCP_ANESU       204 EVEKGKCYKQYEAAVGRYCDAAPSKLGHN    232

Extra Credit

Implement the extensions we discussed in class, local alignment to find the best sub-match, and affine gap penalties.

Testing

Test you above algorithm on the two sequences found in the P42212.fasta and Q9GZ28.fasta files provided.

Turn in your code and sample output from testing in your csc277 directory on the cs.centenary.edu server.