Example 1: DecisionTree

In this example, we use a small data set with response variable, Play, which indicates whether a golfer plays (1) or does not play (0) golf under weather conditions measured by Temperature, Humidity, Outlook (Sunny (0), Overcast (1), Rainy (2)), and Wind (True (0), False (1)). A decision tree is generated by C45 and the ALACART class. The control parameters are adjusted because of the small data size and no cross-validation or pruning is performed. The maximal trees are printed out using DecisionTree.printDecisionTree . Notice that C45 splits on Outlook, then Humidity and Wind, while ALACART splits on Outlook, then Temperature.


import com.imsl.datamining.decisionTree.*;

public class DecisionTreeEx1 {

    public static void main(String[] args) throws Exception {

        int golfResponseIdx = 4;
        double[][] golfXY = {
            {0, 85, 85, 0, 0},
            {0, 80, 90, 1, 0},
            {1, 83, 78, 0, 1},
            {2, 70, 96, 0, 1},
            {2, 68, 80, 0, 1},
            {2, 65, 70, 1, 0},
            {1, 64, 65, 1, 1},
            {0, 72, 95, 0, 0},
            {0, 69, 70, 0, 1},
            {2, 75, 80, 0, 1},
            {0, 75, 70, 1, 1},
            {1, 72, 90, 1, 1},
            {1, 81, 75, 0, 1},
            {2, 71, 80, 1, 0}
        };

        DecisionTree.VariableType[] golfVarType = {
            DecisionTree.VariableType.CATEGORICAL,
            DecisionTree.VariableType.QUANTITATIVE_CONTINUOUS,
            DecisionTree.VariableType.QUANTITATIVE_CONTINUOUS,
            DecisionTree.VariableType.CATEGORICAL,
            DecisionTree.VariableType.CATEGORICAL
        };

        String[] names = {
            "Outlook", "Temperature", "Humidity", "Wind", "Play"
        };
        String[] classNames = {"Don't Play", "Play"};
        String[] varLevels = {"Sunny", "Overcast", "Rainy", "False", "True"};

        C45 dt = new C45(golfXY, golfResponseIdx, golfVarType);
        dt.setMinObsPerChildNode(2);
        dt.setMinObsPerNode(3);
        dt.setMaxNodes(50);
        dt.fitModel();

        System.out.println("\n\nDecision Tree using Method C4.5:");
        dt.printDecisionTree(null, names, classNames,
                varLevels, true);

        ALACART adt = new ALACART(golfXY, golfResponseIdx, golfVarType);
        adt.setMinObsPerChildNode(2);
        adt.setMinObsPerNode(3);
        adt.setMaxNodes(50);
        adt.fitModel();

        System.out.println("\n\nDecision Tree using Method ALACART:");
        dt.printDecisionTree(null, names, classNames,
                varLevels, true);
    }
}

Output



Decision Tree using Method C4.5:

Decision Tree:


Node 0: Cost = 0.357, N= 14, Level = 0, Child nodes:  1  4  5 
P(Y=0)= 0.357
P(Y=1)= 0.643
Predicted Y:  Play 
   
Node 1: Cost = 0.143, N= 5, Level = 1, Child nodes:  2  3 
   Rule:  Outlook  in: { Sunny }
    P(Y=0)= 0.600
    P(Y=1)= 0.400
    Predicted Y:  Don't Play 
      
Node 2: Cost = 0.000, N= 2, Level = 2
      Rule:  Humidity       <= 77.500
        P(Y=0)= 0.000
        P(Y=1)= 1.000
        Predicted Y:  Play 
      
Node 3: Cost = 0.000, N= 3, Level = 2
      Rule:  Humidity       > 77.500
        P(Y=0)= 1.000
        P(Y=1)= 0.000
        Predicted Y:  Don't Play 
   
Node 4: Cost = 0.000, N= 4, Level = 1
   Rule:  Outlook  in: { Overcast }
    P(Y=0)= 0.000
    P(Y=1)= 1.000
    Predicted Y:  Play 
   
Node 5: Cost = 0.143, N= 5, Level = 1, Child nodes:  6  7 
   Rule:  Outlook  in: { Rainy }
    P(Y=0)= 0.400
    P(Y=1)= 0.600
    Predicted Y:  Play 
      
Node 6: Cost = 0.000, N= 3, Level = 2
      Rule:  Wind  in: { False }
        P(Y=0)= 0.000
        P(Y=1)= 1.000
        Predicted Y:  Play 
      
Node 7: Cost = 0.000, N= 2, Level = 2
      Rule:  Wind  in: { True }
        P(Y=0)= 1.000
        P(Y=1)= 0.000
        Predicted Y:  Don't Play 


Decision Tree using Method ALACART:

Decision Tree:


Node 0: Cost = 0.357, N= 14, Level = 0, Child nodes:  1  4  5 
P(Y=0)= 0.357
P(Y=1)= 0.643
Predicted Y:  Play 
   
Node 1: Cost = 0.143, N= 5, Level = 1, Child nodes:  2  3 
   Rule:  Outlook  in: { Sunny }
    P(Y=0)= 0.600
    P(Y=1)= 0.400
    Predicted Y:  Don't Play 
      
Node 2: Cost = 0.000, N= 2, Level = 2
      Rule:  Humidity       <= 77.500
        P(Y=0)= 0.000
        P(Y=1)= 1.000
        Predicted Y:  Play 
      
Node 3: Cost = 0.000, N= 3, Level = 2
      Rule:  Humidity       > 77.500
        P(Y=0)= 1.000
        P(Y=1)= 0.000
        Predicted Y:  Don't Play 
   
Node 4: Cost = 0.000, N= 4, Level = 1
   Rule:  Outlook  in: { Overcast }
    P(Y=0)= 0.000
    P(Y=1)= 1.000
    Predicted Y:  Play 
   
Node 5: Cost = 0.143, N= 5, Level = 1, Child nodes:  6  7 
   Rule:  Outlook  in: { Rainy }
    P(Y=0)= 0.400
    P(Y=1)= 0.600
    Predicted Y:  Play 
      
Node 6: Cost = 0.000, N= 3, Level = 2
      Rule:  Wind  in: { False }
        P(Y=0)= 0.000
        P(Y=1)= 1.000
        Predicted Y:  Play 
      
Node 7: Cost = 0.000, N= 2, Level = 2
      Rule:  Wind  in: { True }
        P(Y=0)= 1.000
        P(Y=1)= 0.000
        Predicted Y:  Don't Play 
Link to Java source.