In this example, we use a small data set with response variable, Play, which indicates whether a golfer plays (1) or does not play (0) golf under weather conditions measured by Temperature, Humidity, Outlook (Sunny (0), Overcast (1), Rainy (2)), and Wind (True (0), False (1)). A decision tree is generated by C45
and the ALACART
class. The control parameters are adjusted because of the small data size and no cross-validation or pruning is performed. The maximal trees are printed out using DecisionTree.printDecisionTree
. Notice that C45
splits on Outlook, then Humidity and Wind, while ALACART
splits on Outlook, then Temperature.
import com.imsl.datamining.decisionTree.*; public class DecisionTreeEx1 { public static void main(String[] args) throws Exception { int golfResponseIdx = 4; double[][] golfXY = { {0, 85, 85, 0, 0}, {0, 80, 90, 1, 0}, {1, 83, 78, 0, 1}, {2, 70, 96, 0, 1}, {2, 68, 80, 0, 1}, {2, 65, 70, 1, 0}, {1, 64, 65, 1, 1}, {0, 72, 95, 0, 0}, {0, 69, 70, 0, 1}, {2, 75, 80, 0, 1}, {0, 75, 70, 1, 1}, {1, 72, 90, 1, 1}, {1, 81, 75, 0, 1}, {2, 71, 80, 1, 0} }; DecisionTree.VariableType[] golfVarType = { DecisionTree.VariableType.CATEGORICAL, DecisionTree.VariableType.QUANTITATIVE_CONTINUOUS, DecisionTree.VariableType.QUANTITATIVE_CONTINUOUS, DecisionTree.VariableType.CATEGORICAL, DecisionTree.VariableType.CATEGORICAL }; String[] names = { "Outlook", "Temperature", "Humidity", "Wind", "Play" }; String[] classNames = {"Don't Play", "Play"}; String[] varLevels = {"Sunny", "Overcast", "Rainy", "False", "True"}; C45 dt = new C45(golfXY, golfResponseIdx, golfVarType); dt.setMinObsPerChildNode(2); dt.setMinObsPerNode(3); dt.setMaxNodes(50); dt.fitModel(); System.out.println("\n\nDecision Tree using Method C4.5:"); dt.printDecisionTree(null, names, classNames, varLevels, true); ALACART adt = new ALACART(golfXY, golfResponseIdx, golfVarType); adt.setMinObsPerChildNode(2); adt.setMinObsPerNode(3); adt.setMaxNodes(50); adt.fitModel(); System.out.println("\n\nDecision Tree using Method ALACART:"); dt.printDecisionTree(null, names, classNames, varLevels, true); } }
Decision Tree using Method C4.5: Decision Tree: Node 0: Cost = 0.357, N= 14, Level = 0, Child nodes: 1 4 5 P(Y=0)= 0.357 P(Y=1)= 0.643 Predicted Y: Play Node 1: Cost = 0.143, N= 5, Level = 1, Child nodes: 2 3 Rule: Outlook in: { Sunny } P(Y=0)= 0.600 P(Y=1)= 0.400 Predicted Y: Don't Play Node 2: Cost = 0.000, N= 2, Level = 2 Rule: Humidity <= 77.500 P(Y=0)= 0.000 P(Y=1)= 1.000 Predicted Y: Play Node 3: Cost = 0.000, N= 3, Level = 2 Rule: Humidity > 77.500 P(Y=0)= 1.000 P(Y=1)= 0.000 Predicted Y: Don't Play Node 4: Cost = 0.000, N= 4, Level = 1 Rule: Outlook in: { Overcast } P(Y=0)= 0.000 P(Y=1)= 1.000 Predicted Y: Play Node 5: Cost = 0.143, N= 5, Level = 1, Child nodes: 6 7 Rule: Outlook in: { Rainy } P(Y=0)= 0.400 P(Y=1)= 0.600 Predicted Y: Play Node 6: Cost = 0.000, N= 3, Level = 2 Rule: Wind in: { False } P(Y=0)= 0.000 P(Y=1)= 1.000 Predicted Y: Play Node 7: Cost = 0.000, N= 2, Level = 2 Rule: Wind in: { True } P(Y=0)= 1.000 P(Y=1)= 0.000 Predicted Y: Don't Play Decision Tree using Method ALACART: Decision Tree: Node 0: Cost = 0.357, N= 14, Level = 0, Child nodes: 1 4 5 P(Y=0)= 0.357 P(Y=1)= 0.643 Predicted Y: Play Node 1: Cost = 0.143, N= 5, Level = 1, Child nodes: 2 3 Rule: Outlook in: { Sunny } P(Y=0)= 0.600 P(Y=1)= 0.400 Predicted Y: Don't Play Node 2: Cost = 0.000, N= 2, Level = 2 Rule: Humidity <= 77.500 P(Y=0)= 0.000 P(Y=1)= 1.000 Predicted Y: Play Node 3: Cost = 0.000, N= 3, Level = 2 Rule: Humidity > 77.500 P(Y=0)= 1.000 P(Y=1)= 0.000 Predicted Y: Don't Play Node 4: Cost = 0.000, N= 4, Level = 1 Rule: Outlook in: { Overcast } P(Y=0)= 0.000 P(Y=1)= 1.000 Predicted Y: Play Node 5: Cost = 0.143, N= 5, Level = 1, Child nodes: 6 7 Rule: Outlook in: { Rainy } P(Y=0)= 0.400 P(Y=1)= 0.600 Predicted Y: Play Node 6: Cost = 0.000, N= 3, Level = 2 Rule: Wind in: { False } P(Y=0)= 0.000 P(Y=1)= 1.000 Predicted Y: Play Node 7: Cost = 0.000, N= 2, Level = 2 Rule: Wind in: { True } P(Y=0)= 1.000 P(Y=1)= 0.000 Predicted Y: Don't PlayLink to Java source.