SHAPE possible values: Circle, Ellipse, Square, Triangle AGE possible values: Young, Old WORTH possible values: Low, High(Since each feature value starts with a different letter, for shorthand we'll just use that initial letter, eg 'C' for Circle.)
Our task will be binary valued, and we'll use '+' and '-' as our category labels.
Here is our TRAIN set:
SHAPE = C AGE = Y WORTH = L CATEGORY = - SHAPE = E AGE = O WORTH = L CATEGORY = - SHAPE = C AGE = Y WORTH = H CATEGORY = - SHAPE = C AGE = O WORTH = H CATEGORY = - SHAPE = S AGE = O WORTH = H CATEGORY = + SHAPE = E AGE = Y WORTH = L CATEGORY = + SHAPE = E AGE = Y WORTH = H CATEGORY = +
And our TEST set:
SHAPE = C AGE = O WORTH = H CATEGORY = - SHAPE = C AGE = Y WORTH = L CATEGORY = - SHAPE = T AGE = Y WORTH = H CATEGORY = + SHAPE = E AGE = Y WORTH = L CATEGORY = + SHAPE = E AGE = O WORTH = H CATEGORY = +
First, apply the decision-tree algorithm in Fig 18.5 of the text (we'll call this algorithm ID3 from now on) to the TRAIN set. Show all your work.
When multiple features tie as being the best one, choose the one whose name appears earliest in alphabetical order (eg, AGE before SHAPE before WORTH). When there is a tie in computing MajorityValue, choose '-'.
Show the confusion matrix that results from applying your induced Decision Tree to the examples in the TEST set.