CSC207 - Lab 12
Predictive Text Messaging

Assigned Oct 28th 2 p.m.
Due Nov 2nd 12 p.m.


Overview

This lab will make extensive use of dictionaries to demonstrate how text entry on cell phones is feasible.

Materials

Description

One of the most popular modules of any cell phone is the text messaging abilities. Predictive Text is a fast and easy text entry system for cell phones, the most common algorithms being t9 and iTap. This lab will examine the details of predictive text algorithms and some of the unexpected consequences.

As society moves to smart phones like the iPhone and Palm Pre with software Qwerty keyboards, this interesting piece of computer science will become a forgotten artifact, but the lessons of predictive text will still be applicable when faces with similar limitations.

Step 1

The standard telephone pad includes 12 keys, 10 for numbers 0-9, plus the * and # symbols. Along with the numbers printed on the keys, most phones include three to four letters, in the following pattern:

NumberLetters
1
2ABC
3DEF
4GHI
5JKL
6MNO
7PQRS
8TUV
9WXYZ

Write a function called text_to_nums(text) which will translate a given string of text into the numbers which should be pressed on the keypad to create this text. You should use a dictionary with the letters as keys and the numbers as values. Be sure to remove all punctuation from the text, capitalize all the incoming text, and translate the space character as "*".

Step 2

Translation from letters to numbers is relatively straight forward. However, the task of translating back from numbers into letters is more challenging. Given the numbers 2665, it is not immediately clear which word the user intended, since each digit maps to three or four letters.

A textonym is a word that is composed of the same underlying numeric key. For example, "cool" is mapped to 2665, which is also the same number for "book".

To automatically make a choice between these two words we need some more information. Predictive text algorithms use the relative likelihood of the words in the English language. We will be using statistics gathered from the British National Corpus, a 100 million word collection of samples of written and spoken language from a wide range of sources. We have a BNC word frequency list, with each line listing the word count followed by the word and some extra information on the part of speech.

Now, we can translate a number back into it's most likely word choice. Since "book" is more popular in this corpus than "cool", it will be the first word returned when typing in those numbers.

Write a function called nums_to_text(nums) which will translate a given string of numbers separated by "*" into the most likely words that created this string of numbers. Use a dictionary with the numbers as keys and the most likely word as the value.

Use your function to translate the following numbers back into the most likely text.

Step 3

Write a separate function called textonyms(text) to find all textonyms for a given word.

Use your function to find the textonyms of the following words:

Incorporate this into your nums_to_text function above, such that all possible text messages are printed out.

Step 4

So far we have been translating complete texts. One other way to speed up our entry of words is to add some predictive element to our text entry. For example, if a user types in 637, your program should prompt them that "message" is the most likely word. This way you need fewer key presses to make longer words.

Write a function called predict_text() to allow the user to enter numbers and create a text message. This function will repeatedly ask the user for individual numbers, and after each letter, will display the most likely word that starts with the sequence of numbers entered. If the user hits enter without a number, add this most likely word to a growing text message and start the next word from scratch. If the user hits enter without a number twice in a row, print the written text message for the user and exit the function.

Extensions

Our algorithm above is feasible, but can be improved in many ways. Suggest a few improvements and what changes they will require in your code.

Evaluation

Write up your found answers for each of the steps above in a file called lab12_evaluation.txt

What to Hand In

Log in to cs.centenary.edu through either Secure FTP or WinSCP using your cs login and password. Create a subdirectory from csc207 called lab12. Copy your substitution.py project into this directory. Make sure you have followed the Python Style Guide, and have run your project through the Automated Style Checker.

You must hand in:


© Mark Goadrich 2009, Centenary College of Louisiana