CSCI 270 - Lab 2
Corpus Creating


Description

In this class we will be analyzing various expressions of the human condition through language, art, and music. To assist us and make our work personally valuable, you will be gathering text, images, and sound files for the class to analyze in subsequent labs.

2.1 - Poem

First, it will be useful to have a small dataset for exploring our algorithms and generating examples.

Find a small poem or lyrics to a song written in English that holds meaning for you.

Save your poem as a plain text document with a short, meaningful file name. (with file extension .txt) Your file size for this poem should be no more than 20KB.

2.2 - Book

To find statistical patterns in text data, we need a large amount of text.

Find a book/novel/treatise that has meaning to you, stored in an electronic format. You should either find the novel available without cost, or purchase the ebook (look for versions without DRM so we can access the raw data).

Save your book as a plain text document with a short, meaningful file name. (with file extension .txt) Your file size for this document should be no less than 150KB.

If your book is in another format, such as .epub, it must be converted to a plain text document. Calibre can assist with this conversion.

2.3 - Image

Find an image of art stored in an electronic file format (PNG, JPEG, TIFF, BMP, etc) that has meaning to you. This file must be at least HD quality (1920 x 1080 pixels in size).

2.4 - Instrumental Music

Find a piece of instrumental music (no singing) stored in an electronic file format (mp3, MIDI, etc) that has meaning to you.

2.5 - Reflection

For each of your selections, answer the following questions: This reflection must also be written as a plain text document, not using any word processing software, saved as reflection.txt.

What to Hand In

Turn in one file for each of the above steps on Moodle.
© Mark Goadrich, Hendrix College