decisionTreePrint

Prints a decision tree.

Synopsis

decisionTreePrint (tree)

Required Arguments

structure tree (Input)
An estimated decision tree.

Optional Arguments

respName, char (Input)

An array of length 1 containing a character string representing the name of the response variable.

Default: respName[0] = “Y”.

varNames, char[] (Input)

An array of length tree.npreds containing character strings representing the names of the predictors.

Default: varNames[0]=”X0”, varNames[1]=”X1”, etc.

classNames, char[] (Input)

An array of length tree.nclasses containing character strings representing the names of the different classes in Y, assuming Y is of categorical type.

Default: classNames[0]=”0”, classNames[1]=”1”, etc.

categNames, char[] (Input)

An array of length tree.predNvalues[0] + tree.predNvalues[1] + ... + tree.predNvalues[tree.npreds-1] containing character strings representing the names of the different category levels for each predictor of categorical type.

Default: categNames[0]=”0”, categNames[1]=”1”, etc.

printMax, (Input)

If present, the maximal tree is printed despite any pruning information.

Default: Accounts for pruning.

Description

Function decisionTreePrint provides a convenient way to quickly see the structure of the tree. More elaborate visualization methods or summaries can be written for the decision tree structure described in Structure Definitions for function decisionTree, and Figure 13.1 in the Overview section].

Comments

1. The nodes are labeled as the tree was grown. In other words, the first child of the root node is labeled Node 1, the first child node of Node 1 is labeled Node 2, and so on, until the branch stops growing. The numbering continues with the most recent split one level up.

2. If the tree has fewer than five levels, each new level is indented. Otherwise, there is no indentation.

Example

This example operates on simulated categorical data.

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.decisionTree import decisionTree
from pyimsl.stat.decisionTreePrint import decisionTreePrint
from pyimsl.stat.decisionTreeFree import decisionTreeFree

xy = [[2, 0, 2],
      [1, 0, 0],
      [2, 1, 3],
      [0, 1, 0],
      [1, 2, 0],
      [2, 2, 3],
      [2, 2, 3],
      [0, 1, 0],
      [0, 0, 0],
      [0, 1, 0],
      [1, 2, 0],
      [2, 0, 2],
      [0, 2, 0],
      [2, 0, 1],
      [0, 0, 0],
      [2, 0, 1],
      [1, 0, 0],
      [0, 2, 0],
      [2, 0, 1],
      [1, 2, 0],
      [0, 2, 2],
      [2, 1, 3],
      [1, 1, 0],
      [2, 2, 3],
      [1, 2, 0],
      [2, 2, 3],
      [2, 0, 1],
      [2, 1, 3],
      [1, 2, 0],
      [1, 1, 0]]

responseColIdx = 2
method = 1
varType = [0, 0, 0]
control = [5, 10, 10, 50, 10]
names = ["Var1", "Var2"]
classNames = ["c1", "c2", "c3", "c4"]
responseName = ["Response"]
varLevels = ["L1", "L2", "L3", "A", "B", "C"]

tree = decisionTree(xy, responseColIdx, varType,
                    control=control)

print("Generated labels:")
decisionTreePrint(tree, printMax=True)

print("\nCustom labels:")
decisionTreePrint(tree, printMax=True,
                  varNames=names, classNames=classNames,
                  categNames=varLevels, respName=responseName)

decisionTreeFree(tree)

Output

Generated labels:

Custom labels:

Decision Tree:

Node 0: Cost = 0.467, N= 30, Level = 0, Child nodes:  1  2  3 
P(Y=0)= 0.533
P(Y=1)= 0.133
P(Y=2)= 0.100
P(Y=3)= 0.233
Predicted Y:   0 
   Node 1: Cost = 0.033, N= 8, Level = 1
   Rule: X0  in: { 0 }
    P(Y=0)= 0.875
    P(Y=1)= 0.000
    P(Y=2)= 0.125
    P(Y=3)= 0.000
    Predicted Y:   0 
   Node 2: Cost = 0.000, N= 9, Level = 1
   Rule: X0  in: { 1 }
    P(Y=0)= 1.000
    P(Y=1)= 0.000
    P(Y=2)= 0.000
    P(Y=3)= 0.000
    Predicted Y:   0 
   Node 3: Cost = 0.200, N= 13, Level = 1
   Rule: X0  in: { 2 }
    P(Y=0)= 0.000
    P(Y=1)= 0.308
    P(Y=2)= 0.154
    P(Y=3)= 0.538
    Predicted Y:   3 

Decision Tree:

Node 0: Cost = 0.467, N= 30, Level = 0, Child nodes:  1  2  3 
P(Y=0)= 0.533
P(Y=1)= 0.133
P(Y=2)= 0.100
P(Y=3)= 0.233
Predicted Response:  c1 
   Node 1: Cost = 0.033, N= 8, Level = 1
   Rule:  Var1  in: { L1 }
    P(Y=0)= 0.875
    P(Y=1)= 0.000
    P(Y=2)= 0.125
    P(Y=3)= 0.000
    Predicted Response:  c1 
   Node 2: Cost = 0.000, N= 9, Level = 1
   Rule:  Var1  in: { L2 }
    P(Y=0)= 1.000
    P(Y=1)= 0.000
    P(Y=2)= 0.000
    P(Y=3)= 0.000
    Predicted Response:  c1 
   Node 3: Cost = 0.200, N= 13, Level = 1
   Rule:  Var1  in: { L3 }
    P(Y=0)= 0.000
    P(Y=1)= 0.308
    P(Y=2)= 0.154
    P(Y=3)= 0.538
    Predicted Response:  c4