CSC207 - Lab 7
Mutation is the Word

Assigned October 1st 2 p.m.
Due October 6th 2 p.m.


Overview

In this lab we will make extensive use of sentinel loops (while loops). We will explore mutation of DNA through playing the game Doublets.

Materials

Description

We have seen how the information of life can be encoded into DNA through chemistry and biology. From DNA we transcribe into RNA and then translate into Proteins. But this is a static process; how did life evolve from single-celled organisms into the diversity of life we see today? The answer lies in a process called mutation.

Separate from the DNA-RNA-Protein process are two separate processes for cell duplication called mitosis and meiosis. Both processes involve a cell replicating its DNA, and it is within this replication that mutations can occur. Perhaps there was a copying mistake, or a random x-ray damaged part of the DNA; in any case, the DNA code we begin with can undergo small alterations. Each mutation during this replication process could possibly have a large impact of the earlier transcription and translation process, such that we could have a new stop codon within a gene, or we could place a different amino acid into our protein chain. While some changes may be very harmful and lead to genetic diseases, others could provide an unexpected benefit and new functionality. These underlying mutations introduce variety into the gene pool for a population, and allow natural selection and evolution to occur.

While the language of DNA was not understood until 1953 by James Watson and Francis Crick with the help of data from Rosalind Franklin, similar properties of mutation were noticed nearly 100 years earlier by Charles Lutwidge Dodgson (aka Lewis Carroll) , who is most famous for his novel "Alice in Wonderland." In an article for Vanity Fair, Carroll created a puzzle called Doublets, which explored the effects of mutation on English words. His rules description is as follows:

The rules of the Puzzle are simple enough. Two words are proposed, of the same length; and the Puzzle consists in linking these together by interposing other words, each of which shall differ from the next word in one letter only. That is to say, one letter may be changed in one of the given words, then one letter in the word so obtained, and so on, till we arrive at the other given word. The letters must not be interchanged among themselves, but each must keep to its own place. As an example, the word 'head' may be changed into 'tail' by interposing the words 'heal, teal, tell, tall'. I call the given words 'a Doublet', the interposed words 'Links', and the entire series 'a Chain', of which I here append an example:

H E A D
h e a l
t e a l
t e l l
t a l l
T A I L

It is, perhaps, needless to state that it is de rigueur that the links should be English words, such as might be used in good society.

(More details on the connection between Lewis Carroll and genetics can be found in an article by David B. Searls entitled From Jabberwocky to Genome: Lewis Carroll and Computational Biology).

Our task for this lab is to create a Python program that lets a user play the Doublets game and enforces all the rules given above by Lewis Carroll.

Step 1 - Analyze the Problem, Create Specifications

We will tackle this puzzle using the software engineering phases. First, analyze the problem to see what steps we can take to model this inside a computer. Try to answer the following questions:

Write your answers in a file called lab7_evaluation.txt.

Step 2 - Develop Pseudocode

With your analysis from answering these questions, develop an algorithmic solution using pseudocode. This should model the interactions you expect for your computer to take, but is written in English instead of Python.

To demonstrate the eventual behavior of the program, here is a sample run of the Doublets game.

What is the starting word? CAT
What is the ending word? DOG
Start   = CAT
Current = CAT
End     = DOG
Which character do you want to change? (the first character is 1) 3
What is your new character? B
Current = CAB
End     = DOG
Which character do you want to change? (the first character is 1) 2
What is your new character? O
Current = COB
End     = DOG
Which character do you want to change? (the first character is 1) 3
What is your new character? G
Current = COG
End     = DOG
Which character do you want to change? (the first character is 1) 1
What is your new character? D
Solution path found in 4 steps.
CAT -> CAB -> COB -> COG -> DOG

A second run of the program is shown below, where the user made many mistakes.

What is the starting word? log
What is the ending word? worm
The lengths are not equal. Please try again.
What is the ending word? sfu
sfu is not a word. Please try again.
What is the ending word? bug
Start   = LOG
Current = LOG
End     = BUG
Which character do you want to change? (the first character is 1) 0
0 is out of range. Please try again.
Which character do you want to change? (the first character is 1) 4
4 is out of range. Please try again.
Which character do you want to change? (the first character is 1) 1
What is your new character? 4
4og is not a valid word
Current = LOG
End     = BUG
Which character do you want to change? (the first character is 1) 1
What is your new character? b
Current = BOG
End     = BUG
Which character do you want to change? (the first character is 1) x
x is not a number. Please try again.
Which character do you want to change? (the first character is 1) 2
What is your new character? uu
uu is more than one character. Please try again.
Which character do you want to change? (the first character is 1) 2
What is your new character? u
Solution path found in 2 steps.
LOG -> BOG -> BUG

Add the pseudocode to your lab7_evaluation.txt.

Step 3 - Implement Incrementally

Download the text file english-words.txt and the python module dictionary.py files to your working directory (in Magale 1A this will be C:\Python2.5\). The dictionary module contains one function valid_word(word, file), which will return True if the word is found in the file and False otherwise.

Now take your pseduocode description and begin to implement your program in Python. Your program should be named doublets.py. Make sure to import the dictionary module before trying to use the valid_word function. Write your code in small sections and test incrementally.

The approach we will take is to start with your basic functions and build your program from the bottom up. Write each function individually and then test to make sure it will provide the functionality you need. In this way you will gradually build your whole Python program and know that each piece will work as you need.

Step 4 - Test, Debug and Maintain

With incremental implementation, Steps 3 and 4 should go hand in hand. Never write more than 5-10 lines of code before testing your changes. First make your program work with perfect input, and then expand it to work with erroneous user input.

Once your program is working, try out some of the transformations below

Future Thoughts

Your doublets.py program lets the user chose what character to change at each step, and then verifies that each new word is a valid word. Can you think of a way to always find the minimum number of steps to transform one word to another? How will you know if a path between two words can be found?

What to Hand In

Log in to cs.centenary.edu through either Secure FTP or WinSCP using your cs login and password. Create a subdirectory from csc207 called lab7. Copy your doublets.py project into this directory. Make sure you have followed the Python Style Guide, and have run your project through the Automated Style Checker.

You must hand in:


© Mark Goadrich 2007, Centenary College of Louisiana