CNL Stat : Data Mining : decision_tree_print
decision_tree_print
Prints a decision tree.
Synopsis
#include <imsls.h>
void imsls_f_decision_tree_print (Imsls_f_decision_tree *tree, ..., 0)
The type double function is imsls_d_decision_tree_print.
Required Arguments
imsls_f_decision_tree *tree (Input)
An estimated decision tree.
Synopsis with Optional Arguments
#include <imsls.h>
void imsls_f_decision_tree_print (Imsls_f_decision_tree *tree,
IMSLS_RESP_NAMEchar *response_name,
IMSLS_VAR_NAMESchar *names[],
IMSLS_CLASS_NAMESchar *class_names[],
IMSLS_CATEG_NAMESchar *categ_names[],
IMSLS_PRINT_MAX,
0)
Optional Arguments
IMSLS_RESP_NAME, char *response_name (Input)
An array of length 1 containing a pointer to a character string representing the name of the response variable.
Default: response_name[0] = “Y”.
IMSLS_VAR_NAMES, char *var_names[] (Input)
An array of length tree->npreds containing pointers to character strings representing the names of the predictors.
Default: var_names[0]=”X0”, var_names[1]=”X1”, etc.
IMSLS_CLASS_NAMES, char *class_names[] (Input)
An array of length tree->nclasses containing pointers to character strings representing the names of the different classes in Y, assuming Y is of categorical type.
Default: class_names[0]=”0”, class_names[1]=”1”, etc.
IMSLS_CATEG_NAMES, char *categ_names[] (Input)
An array of length tree‑>pred_nvalues[0] + tree‑>pred_nvalues[1] + … + tree‑>pred_nvalues[tree->npreds-1] containing pointers to character strings representing the names of the different category levels for each predictor of categorical type.
Default: categ_names[0]=”0”, categ_names[1]=”1”, etc.
IMSLS_PRINT_MAX, (Input)
If present, the maximal tree is printed despite any pruning information.
Default: Accounts for pruning.
Description
Function imsls_f_decision_tree_print provides a convenient way to quickly see the structure of the tree. More elaborate visualization methods or summaries can be written for the decision tree structure described in Structure Definitions for function decision_tree, and Figure 22 in the Overview section].
Comments
1. The nodes are labeled as the tree was grown. In other words, the first child of the root node is labeled Node 1, the first child node of Node 1 is labeled Node 2, and so on, until the branch stops growing. The numbering continues with the most recent split one level up.
2. If the tree has fewer than five levels, each new level is indented. Otherwise, there is no indentation.
Example
This example operates on simulated categorical data.
 
#include <imsls.h>
#include <stdio.h>
 
int main()
{
 
float xy[30*3] =
{
2, 0, 2,
1, 0, 0,
2, 1, 3,
0, 1, 0,
1, 2, 0,
2, 2, 3,
2, 2, 3,
0, 1, 0,
0, 0, 0,
0, 1, 0,
1, 2, 0,
2, 0, 2,
0, 2, 0,
2, 0, 1,
0, 0, 0,
2, 0, 1,
1, 0, 0,
0, 2, 0,
2, 0, 1,
1, 2, 0,
0, 2, 2,
2, 1, 3,
1, 1, 0,
2, 2, 3,
1, 2, 0,
2, 2, 3,
2, 0, 1,
2, 1, 3,
1, 2, 0,
1, 1, 0
};
 
int n = 30;
int ncols = 3;
int response_col_idx= 2;
int var_type[] = {0, 0, 0};
int control[] = {5, 10, 10, 50, 10};
 
const char* names[] = {"Var1", "Var2"};
const char* class_names[] = {"c1", "c2", "c3", "c4"};
const char* response_name = "Response";
const char* var_levels[] = {"L1", "L2", "L3", "A", "B", "C"};
Imsls_f_decision_tree *tree = NULL;
 
tree = imsls_f_decision_tree(n, ncols, xy, response_col_idx, var_type,
IMSLS_CONTROL, control,
0);
 
printf("\nGenerated labels:\n");
imsls_f_decision_tree_print(tree,
IMSLS_PRINT_MAX,
0);
printf("\nCustom labels:\n");
 
imsls_f_decision_tree_print(tree,
IMSLS_RESP_NAME, &response_name,
IMSLS_VAR_NAMES, names,
IMSLS_CATEG_NAMES, var_levels,
IMSLS_CLASS_NAMES, class_names,
IMSLS_PRINT_MAX,
0);
 
imsls_f_decision_tree_free(tree);
}
Output
 
Generated labels:
 
Decision Tree:
 
Node 0: Cost = 0.467, N= 30, Level = 0, Child nodes: 1 2 3
P(Y=0)= 0.533
P(Y=1)= 0.133
P(Y=2)= 0.100
P(Y=3)= 0.233
Predicted Y: 0
Node 1: Cost = 0.033, N= 8, Level = 1
Rule: X0 in: { 0 }
P(Y=0)= 0.875
P(Y=1)= 0.000
P(Y=2)= 0.125
P(Y=3)= 0.000
Predicted Y: 0
Node 2: Cost = 0.000, N= 9, Level = 1
Rule: X0 in: { 1 }
P(Y=0)= 1.000
P(Y=1)= 0.000
P(Y=2)= 0.000
P(Y=3)= 0.000
Predicted Y: 0
Node 3: Cost = 0.200, N= 13, Level = 1
Rule: X0 in: { 2 }
P(Y=0)= 0.000
P(Y=1)= 0.308
P(Y=2)= 0.154
P(Y=3)= 0.538
Predicted Y: 3
 
Custom labels:
 
Decision Tree:
 
Node 0: Cost = 0.467, N= 30, Level = 0, Child nodes: 1 2 3
P(Y=0)= 0.533
P(Y=1)= 0.133
P(Y=2)= 0.100
P(Y=3)= 0.233
Predicted Response c1
Node 1: Cost = 0.033, N= 8, Level = 1
Rule: Var1 in: { L1 }
P(Y=0)= 0.875
P(Y=1)= 0.000
P(Y=2)= 0.125
P(Y=3)= 0.000
Predicted Response c1
Node 2: Cost = 0.000, N= 9, Level = 1
Rule: Var1 in: { L2 }
P(Y=0)= 1.000
P(Y=1)= 0.000
P(Y=2)= 0.000
P(Y=3)= 0.000
Predicted Response c1
Node 3: Cost = 0.200, N= 13, Level = 1
Rule: Var1 in: { L3 }
P(Y=0)= 0.000
P(Y=1)= 0.308
P(Y=2)= 0.154
P(Y=3)= 0.538
Predicted Response c4