public abstract class DecisionTreeInfoGain extends DecisionTree implements Serializable, Cloneable
Abstract class that extends DecisionTree for classes that use an
information gain criteria.
| Modifier and Type | Class and Description |
|---|---|
static class |
DecisionTreeInfoGain.GainCriteria
Specifies which information gain criteria to use in determining the best
split at each node.
|
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeExceptionPredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType| Constructor and Description |
|---|
DecisionTreeInfoGain(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
DecisionTree object for a single response
variable and multiple predictor variables. |
| Modifier and Type | Method and Description |
|---|---|
protected double |
information(int[] x,
int[] y,
double[] classCounts,
double[] weights,
boolean xInfo)
Returns the expected information of a variable
y over a
partition determined by the variable x. |
protected abstract int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
int[] splitPartition)
Abstract method for selecting the next split variable and split
definition for the node.
|
void |
setGainCriteria(DecisionTreeInfoGain.GainCriteria gainCriteria)
Specifies which criteria to use in gain calculations in order to
determine the best split at each node.
|
void |
setUseRatio(boolean ratio)
Sets the flag to use or not use the gain ratio instead of the gain to
determine the best split.
|
boolean |
useGainRatio()
Returns whether or not the gain ratio is to be used instead of the gain
to determine the best split.
|
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinObsPerChildNode, getMinObsPerNode, getNumberOfComplexityValues, getNumberOfSets, isAutoPruningFlag, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setConfiguration, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNodegetClassCounts, getCostMatrix, getMaxNumberOfCategories, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isMustFitModelFlag, isUserFixedNClasses, setClassCounts, setCostMatrix, setMaxNumberOfCategories, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setWeightspublic DecisionTreeInfoGain(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
DecisionTree object for a single response
variable and multiple predictor variables.xy - a double matrix with rows containing the
observations on the predictor variables and one response variable.responseColumnIndex - an int specifying the column
index of the response variable.varType - a PredictiveModel.VariableType
array containing the type of each variable.protected double information(int[] x,
int[] y,
double[] classCounts,
double[] weights,
boolean xInfo)
y over a
partition determined by the variable x.
Given a data subset
containing both variables
and
, let
![]()
![]()
DecisionTreeInfoGain.GainCriteria.
Note: if x - an int array of length xy.length
containing values of a predictor or an indicator vector defining the
partition of the observations.y - int array of length xy.length
containing the values of the response variable.classCounts - a double array containing the counts for
each class of the response variable, when it is categorical.weights - a double array used to indicate which subset
of the observations belong in the current node.xInfo - a boolean indicating that we are getting
information about x using a simple frequency estimate.
| Value | Method |
true |
simple frequency estimate |
false |
prior probabilities |
double indicating the information uncertainty.protected abstract int selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
int[] splitPartition)
selectSplitVariable in class DecisionTreexy - a double matrix containing the data.classCounts - a double array containing the counts for
each class of the response variable, when it is categorical.parentFreq - a double array used to indicate which
subset of the observations belong in the current node.splitValue - a double array representing the resulting
split point if the selected variable is quantitative.splitPartition - an int array indicating the resulting
split partition if the selected variable is categorical.int specifying the column index of the split
variable in xy.public void setGainCriteria(DecisionTreeInfoGain.GainCriteria gainCriteria)
gainCriteria - a DecisionTreeInfoGain.GainCriteria specifying which criteria to
use in gain calculations in order to determine the best split at each
node.
Default: gainCriteria = DecisionTreeInfoGain.GainCriteria.SHANNON_ENTROPY
public void setUseRatio(boolean ratio)
ratio - a boolean indicating if the gain ratio is to be used.
true results in the gain ratio being used and
false indicates the gain is to be used.
Default: useRatio=false
public boolean useGainRatio()
boolean indicating if the gain ratio is to be
used.
true results in the gain ratio being used and
false indicates the gain is to be used.
Copyright © 1970-2015 Rogue Wave Software
Built June 18 2015.