In this example, we use a small data set with response variable, Play, which indicates whether a golfer plays (1) or does not play (0) golf under weather conditions measured by Temperature, Humidity, Outlook (Sunny (0), Overcast (1), Rainy (2)), and Wind (True (0), False (1)). A decision tree is generated by C45
and the ALACART
class. The control parameters are adjusted because of the small data size and no cross-validation or pruning is performed. The maximal trees are printed out using DecisionTree.printDecisionTree
. Notice that C45
splits on Outlook, then Humidity and Wind, while ALACART
splits on Outlook, then Temperature.
import com.imsl.datamining.decisionTree.*;
public class DecisionTreeEx1 {
public static void main(String[] args) throws Exception {
int golfResponseIdx = 4;
double[][] golfXY = {
{0, 85, 85, 0, 0},
{0, 80, 90, 1, 0},
{1, 83, 78, 0, 1},
{2, 70, 96, 0, 1},
{2, 68, 80, 0, 1},
{2, 65, 70, 1, 0},
{1, 64, 65, 1, 1},
{0, 72, 95, 0, 0},
{0, 69, 70, 0, 1},
{2, 75, 80, 0, 1},
{0, 75, 70, 1, 1},
{1, 72, 90, 1, 1},
{1, 81, 75, 0, 1},
{2, 71, 80, 1, 0}
};
DecisionTree.VariableType[] golfVarType = {
DecisionTree.VariableType.CATEGORICAL,
DecisionTree.VariableType.QUANTITATIVE_CONTINUOUS,
DecisionTree.VariableType.QUANTITATIVE_CONTINUOUS,
DecisionTree.VariableType.CATEGORICAL,
DecisionTree.VariableType.CATEGORICAL
};
String[] names = {"Outlook", "Temperature", "Humidity", "Wind", "Play"};
String[] classNames = {"Don't Play", "Play"};
String[] varLevels = {"Sunny", "Overcast", "Rainy", "False", "True"};
C45 dt = new C45(golfXY, golfResponseIdx, golfVarType);
dt.setMinObsPerChildNode(2);
dt.setMinObsPerNode(3);
dt.setMaxNodes(50);
dt.fitModel();
System.out.println("\n\nDecision Tree using Method C4.5:");
dt.printDecisionTree(null, names, classNames,
varLevels, true);
ALACART adt = new ALACART(golfXY, golfResponseIdx, golfVarType);
adt.setMinObsPerChildNode(2);
adt.setMinObsPerNode(3);
adt.setMaxNodes(50);
adt.fitModel();
System.out.println("\n\nDecision Tree using Method ALACART:");
dt.printDecisionTree(null, names, classNames,
varLevels, true);
}
}
Decision Tree using Method C4.5:
Decision Tree:
Node 0: Cost = 0.357, N= 14, Level = 0, Child nodes: 1 4 5
P(Y=0)= 0.357
P(Y=1)= 0.643
Predicted Y: Play
Node 1: Cost = 0.143, N= 5, Level = 1, Child nodes: 2 3
Rule: Outlook in: { Sunny }
P(Y=0)= 0.600
P(Y=1)= 0.400
Predicted Y: Don't Play
Node 2: Cost = 0.000, N= 2, Level = 2
Rule: Humidity <= 77.500
P(Y=0)= 0.000
P(Y=1)= 1.000
Predicted Y: Play
Node 3: Cost = 0.000, N= 3, Level = 2
Rule: Humidity <= 77.500
P(Y=0)= 1.000
P(Y=1)= 0.000
Predicted Y: Don't Play
Node 4: Cost = 0.000, N= 4, Level = 1
Rule: Outlook in: { Overcast }
P(Y=0)= 0.000
P(Y=1)= 1.000
Predicted Y: Play
Node 5: Cost = 0.143, N= 5, Level = 1, Child nodes: 6 7
Rule: Outlook in: { Rainy }
P(Y=0)= 0.400
P(Y=1)= 0.600
Predicted Y: Play
Node 6: Cost = 0.000, N= 3, Level = 2
Rule: Wind in: { False }
P(Y=0)= 0.000
P(Y=1)= 1.000
Predicted Y: Play
Node 7: Cost = 0.000, N= 2, Level = 2
Rule: Wind in: { True }
P(Y=0)= 1.000
P(Y=1)= 0.000
Predicted Y: Don't Play
Decision Tree using Method ALACART:
Decision Tree:
Node 0: Cost = 0.357, N= 14, Level = 0, Child nodes: 1 4 5
P(Y=0)= 0.357
P(Y=1)= 0.643
Predicted Y: Play
Node 1: Cost = 0.143, N= 5, Level = 1, Child nodes: 2 3
Rule: Outlook in: { Sunny }
P(Y=0)= 0.600
P(Y=1)= 0.400
Predicted Y: Don't Play
Node 2: Cost = 0.000, N= 2, Level = 2
Rule: Humidity <= 77.500
P(Y=0)= 0.000
P(Y=1)= 1.000
Predicted Y: Play
Node 3: Cost = 0.000, N= 3, Level = 2
Rule: Humidity <= 77.500
P(Y=0)= 1.000
P(Y=1)= 0.000
Predicted Y: Don't Play
Node 4: Cost = 0.000, N= 4, Level = 1
Rule: Outlook in: { Overcast }
P(Y=0)= 0.000
P(Y=1)= 1.000
Predicted Y: Play
Node 5: Cost = 0.143, N= 5, Level = 1, Child nodes: 6 7
Rule: Outlook in: { Rainy }
P(Y=0)= 0.400
P(Y=1)= 0.600
Predicted Y: Play
Node 6: Cost = 0.000, N= 3, Level = 2
Rule: Wind in: { False }
P(Y=0)= 0.000
P(Y=1)= 1.000
Predicted Y: Play
Node 7: Cost = 0.000, N= 2, Level = 2
Rule: Wind in: { True }
P(Y=0)= 1.000
P(Y=1)= 0.000
Predicted Y: Don't Play
Link to Java source.