CSCI 270 - Lab 2
Corpus Creating
Description
In this class we will be analyzing various expressions of the human condition through
language, art, and music. To assist us and make our work personally valuable, you will
be gathering text, images, and sound files for the class to analyze in subsequent labs.
2.1 - Poem
First, it will be useful to have a small dataset for exploring our algorithms and generating
examples.
Find a small poem or lyrics to a song written in English that holds meaning for you.
Save your poem as a plain text document with a short, meaningful file name. (with file extension .txt
)
Your file size for this poem should be no more than 20KB.
2.2 - Book
To find statistical patterns in text data, we need a large amount of text.
Find a book/novel/treatise that has meaning to you, stored in an electronic format. You should either find the novel
available without cost, or purchase the ebook (look for versions without DRM so we can access the
raw data).
Save your book as a plain text document with a short, meaningful file name. (with file extension .txt
)
Your file size for this document should be no less than 150KB.
If your book is in another format, such as .epub
, it must be converted to a plain text document.
Calibre can assist with this conversion.
2.3 - Image
Find an image of art stored in an electronic file format (PNG, JPEG, TIFF, BMP, etc) that has meaning to you.
This file must be at least HD quality (1920 x 1080 pixels in size).
2.4 - Instrumental Music
Find a piece of instrumental music (no singing) stored in an electronic file format (mp3, MIDI, etc) that has meaning to you.
2.5 - Reflection
For each of your selections, answer the following questions:
- What makes this selection interesting to you?
- What do you estimate is the reading level for this text? (only answer for Poem and Book)
- Would you say this selection conveys an overall positive or negative sentiment?
- Formulate a research question you hope to answer by analyzing this selection computationally.
This reflection must also be written as a plain text document, not using any word processing
software, saved as reflection.txt
.
What to Hand In
Turn in one file for each of the above steps on Moodle.
© Mark Goadrich, Hendrix College