Example 2: DecisionTree

This example uses the C45 method on simulated categorical data and demonstrates printing the tree structure with and without custom labels.


import com.imsl.datamining.decisionTree.*;

public class DecisionTreeEx2 {

    public static void main(String[] args) throws Exception {

        double[][] xy = {
            {2, 0, 2},
            {1, 0, 0},
            {2, 1, 3},
            {0, 1, 0},
            {1, 2, 0},
            {2, 2, 3},
            {2, 2, 3},
            {0, 1, 0},
            {0, 0, 0},
            {0, 1, 0},
            {1, 2, 0},
            {2, 0, 2},
            {0, 2, 0},
            {2, 0, 1},
            {0, 0, 0},
            {2, 0, 1},
            {1, 0, 0},
            {0, 2, 0},
            {2, 0, 1},
            {1, 2, 0},
            {0, 2, 2},
            {2, 1, 3},
            {1, 1, 0},
            {2, 2, 3},
            {1, 2, 0},
            {2, 2, 3},
            {2, 0, 1},
            {2, 1, 3},
            {1, 2, 0},
            {1, 1, 0}
        };

        DecisionTree.VariableType[] varType = {
            DecisionTree.VariableType.CATEGORICAL,
            DecisionTree.VariableType.CATEGORICAL,
            DecisionTree.VariableType.CATEGORICAL
        };

        String responseName = "Response";
        String[] names = {"Var1", "Var2"};
        String[] classNames = {"c1", "c2", "c3", "c4"};
        String[] varLabels = {"L1", "L2", "L3", "A", "B", "C"};

        C45 dt = new C45(xy, 2, varType);
        dt.setMinObsPerChildNode(5);
        dt.setMinObsPerNode(10);
        dt.setMaxNodes(50);
        dt.fitModel();

        System.out.println("\nGenerated labels:");
        dt.printDecisionTree(true);
        System.out.println("\nCustom labels:");
        dt.printDecisionTree(responseName, names,
                classNames, varLabels, false);
    }
}

Output


Generated labels:

Decision Tree:


Node 0: Cost = 0.467, N= 30, Level = 0, Child nodes:  1  2  3 
P(Y=0)= 0.533
P(Y=1)= 0.133
P(Y=2)= 0.100
P(Y=3)= 0.233
Predicted Y:   0 
   
Node 1: Cost = 0.033, N= 8, Level = 1
   Rule: X0  in: { 0 }
    P(Y=0)= 0.875
    P(Y=1)= 0.000
    P(Y=2)= 0.125
    P(Y=3)= 0.000
    Predicted Y:   0 
   
Node 2: Cost = 0.000, N= 9, Level = 1
   Rule: X0  in: { 1 }
    P(Y=0)= 1.000
    P(Y=1)= 0.000
    P(Y=2)= 0.000
    P(Y=3)= 0.000
    Predicted Y:   0 
   
Node 3: Cost = 0.200, N= 13, Level = 1
   Rule: X0  in: { 2 }
    P(Y=0)= 0.000
    P(Y=1)= 0.308
    P(Y=2)= 0.154
    P(Y=3)= 0.538
    Predicted Y:   3 

Custom labels:

Decision Tree:


Node 0: Cost = 0.467, N= 30, Level = 0, Child nodes:  1  2  3 
P(Y=0)= 0.533
P(Y=1)= 0.133
P(Y=2)= 0.100
P(Y=3)= 0.233
Predicted Response:  c1 
   
Node 1: Cost = 0.033, N= 8, Level = 1
   Rule:  Var1  in: { L1 }
    P(Y=0)= 0.875
    P(Y=1)= 0.000
    P(Y=2)= 0.125
    P(Y=3)= 0.000
    Predicted Response:  c1 
   
Node 2: Cost = 0.000, N= 9, Level = 1
   Rule:  Var1  in: { L2 }
    P(Y=0)= 1.000
    P(Y=1)= 0.000
    P(Y=2)= 0.000
    P(Y=3)= 0.000
    Predicted Response:  c1 
   
Node 3: Cost = 0.200, N= 13, Level = 1
   Rule:  Var1  in: { L3 }
    P(Y=0)= 0.000
    P(Y=1)= 0.308
    P(Y=2)= 0.154
    P(Y=3)= 0.538
    Predicted Response:  c4 
Link to Java source.