public class ALACART extends DecisionTreeInfoGain implements DecisionTreeSurrogateMethod, Serializable, Cloneable
Generates a decision tree using the CARTTM method of Breiman, Friedman, Olshen and Stone (1984). CARTTM stands for Classification and Regression Trees and applies to categorical or quantitative type variables.
Only binary splits are considered for categorical variables. That is, if X has values {A, B, C, D}, splits into only two subsets are considered, e.g., {A} and {B, C, D}, or {A, B} and {C, D}, are allowed, but a three-way split defined by {A}, {B} and {C,D} is not.
For classification problems, ALACART uses a similar criterion to
information gain called impurity. The method searches for a split that
reduces the node impurity the most. For a given set of data S at a
node, the node impurity for a C-class categorical response is a function of
the class probabilities.
![]()
The measure function
should be 0 for "pure"
nodes, where all Y are in the same class, and maximum when Y is
uniformly distributed across the classes.
As only binary splits of a subset S are considered
(S1, S2 such that
and
), the reduction
in impurity when splitting S into S1,
S2 is
![]()
where
![]()
The gain criteria and the reduction in impurity
are similar concepts and equivalent when I is entropy and when only
binary splits are considered. Another popular measure for the impurity at a
node is the Gini index, given by

If Y is an ordered response or continuous, the problem is a regression
problem. ALACART generates the tree using the same steps, except
that node-level measures or loss-functions are the mean squared error (MSE)
or mean absolute error (MAD) rather than node impurity measures.
Any observation or case with a missing response variable is eliminated from
the analysis. If a predictor has a missing value, each algorithm will skip
that case when evaluating the given predictor. When making a prediction for a
new case, if the split variable is missing, the prediction function applies
surrogate split-variables and splitting rules in turn, if they are
estimated with the decision tree. Otherwise, the prediction function returns
the prediction from the most recent non-terminal node. In this
implementation, only ALACART estimates surrogate split variables
when requested.
DecisionTreeInfoGain.GainCriteriaDecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeExceptionPredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType| Constructor and Description |
|---|
ALACART(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs an
ALACART decision tree for a single response
variable and multiple predictor variables. |
| Modifier and Type | Method and Description |
|---|---|
void |
addSurrogates(Tree tree,
double[] surrogateInfo)
Adds the surrogate information to the tree.
|
int |
getNumberOfSurrogateSplits()
Returns the number of surrogate splits.
|
double[] |
getSurrogateInfo()
Returns the surrogate split information.
|
protected int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
int[] splitPartition)
Selects the split variable for the present node using the
CARTTM method.
|
void |
setNumberOfSurrogateSplits(int nSplits)
Sets the number of surrogate splits.
|
information, setGainCriteria, setUseRatio, useGainRatiofitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinObsPerChildNode, getMinObsPerNode, getNumberOfComplexityValues, getNumberOfSets, isAutoPruningFlag, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setConfiguration, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNodegetClassCounts, getCostMatrix, getMaxNumberOfCategories, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isMustFitModelFlag, isUserFixedNClasses, setClassCounts, setCostMatrix, setMaxNumberOfCategories, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setWeightspublic ALACART(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
ALACART decision tree for a single response
variable and multiple predictor variables.xy - a double matrix with rows containing the
observations on the predictor variables and one response variable.responseColumnIndex - an int specifying the column
index of the response variable.varType - a PredictiveModel.VariableType
array containing the type of each variable.public void addSurrogates(Tree tree, double[] surrogateInfo)
addSurrogates in interface DecisionTreeSurrogateMethodtree - a Tree containing the decision tree structure.surrogateInfo - a double array containing the surrogate
split information.public int getNumberOfSurrogateSplits()
getNumberOfSurrogateSplits in interface DecisionTreeSurrogateMethodint specifying the number of surrogate splits.public double[] getSurrogateInfo()
getSurrogateInfo in interface DecisionTreeSurrogateMethoddouble[] containing the surrogate split
information.protected int selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
int[] splitPartition)
selectSplitVariable in class DecisionTreeInfoGainxy - a double matrix containing the data.classCounts - a double array containing the counts for
each class of the response variable, when it is categorical.parentFreq - a double array used to determine the
subset of the observations that belong to the current node.splitValue - a double array representing the resulting
split point if the selected variable is quantitative.splitPartition - an int array indicating the resulting
split partition if the selected variable is categorical.int specifying the column index of the split
variable in xy.public void setNumberOfSurrogateSplits(int nSplits)
setNumberOfSurrogateSplits in interface DecisionTreeSurrogateMethodnSplits - an int specifying the number of predictors to
consider as surrogate splitting variables.
Default: nSplits = 0.
Copyright © 1970-2015 Rogue Wave Software
Built June 18 2015.