CSC 440 - Homework 4
Decision Tree


Assume you are using the following features to represent examples:

     SHAPE        possible values:     Circle, Ellipse, Square, Triangle

     AGE          possible values:     Young, Old

     WORTH        possible values:     Low, High

(Since each feature value starts with a different letter, for shorthand we'll just use that initial letter, eg 'C' for Circle.)

Our task will be binary valued, and we'll use '+' and '-' as our category labels.

Here is our TRAIN set:


     SHAPE = C    AGE = Y   WORTH = L    CATEGORY = -

     SHAPE = E    AGE = O   WORTH = L    CATEGORY = -

     SHAPE = C    AGE = Y   WORTH = H    CATEGORY = -

     SHAPE = C    AGE = O   WORTH = H    CATEGORY = -

     SHAPE = S    AGE = O   WORTH = H    CATEGORY = +

     SHAPE = E    AGE = Y   WORTH = L    CATEGORY = +

     SHAPE = E    AGE = Y   WORTH = H    CATEGORY = +

And our TEST set:


     SHAPE = C    AGE = O   WORTH = H     CATEGORY = -

     SHAPE = C    AGE = Y   WORTH = L     CATEGORY = -

     SHAPE = T    AGE = Y   WORTH = H     CATEGORY = +

     SHAPE = E    AGE = Y   WORTH = L     CATEGORY = +

     SHAPE = E    AGE = O   WORTH = H     CATEGORY = +

First, apply the decision-tree algorithm in Fig 18.5 of the text (we'll call this algorithm ID3 from now on) to the TRAIN set. Show all your work.

When multiple features tie as being the best one, choose the one whose name appears earliest in alphabetical order (eg, AGE before SHAPE before WORTH). When there is a tie in computing MajorityValue, choose '-'.

Show the confusion matrix that results from applying your induced Decision Tree to the examples in the TEST set.