public class CHAID extends DecisionTree implements Serializable, Cloneable
Generates a decision tree using CHAID for categorical or discrete ordered
predictor variables. Due to Kass (1980), CHAID is an acronym for chi-square
automatic interaction detection. At each node, CHAID
looks for
the best splitting variable using the following steps: given a predictor
variable X, perform a 2-way chi-squared test of association between
each possible pair of categories of X with the categories of Y.
The least significant result is noted and, if a threshold is met, the two
categories of X are merged.
Next, treating this merged category as a single category, CHAID repeats the
series of tests to determine if there is further merging possible. If a
merged category consists of three or more of the original categories of
X, CHAID
calls for a step to test whether the merged
categories should be split. This is done by forming all binary partitions of
the merged category and testing each one against
Y in a 2-way test of association. If the most significant result meets
a threshold, then the merged category is split accordingly. As long as the
threshold in this step is smaller than the threshold in the merge step, the
splitting step and the merge step will not cycle back and forth.
Once each predictor is processed in this manner, the predictor with the most significant qualifying 2-way test with Y is selected as the splitting variable, and its last state of merged categories defines the split at the given node. If none of the tests qualify (by having an adjusted p-value smaller than a threshold), then the node is not split. This growing procedure continues until one or more stopping conditions are met.
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor and Description |
---|
CHAID(CHAID chaidModel)
Constructs a copy of the input
CHAID decision tree. |
CHAID(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
CHAID object for a single response variable and
multiple predictor variables. |
Modifier and Type | Method and Description |
---|---|
CHAID |
clone()
Clones a
CHAID decision tree. |
double |
getMergeCategoriesSigLevel()
Returns the significance level for merging categories.
|
double |
getSplitMergedCategoriesSigLevel()
Returns the significance level for splitting previously merged
categories.
|
double |
getSplitVariableSignificanceLevel()
Returns the significance level for split variable selection.
|
protected int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
double[] splitCriterionValue,
int[] splitPartition)
Selects the split variable for the current node using CHAID (chi-square
automatic interaction detection).
|
protected void |
setConfiguration(PredictiveModel pm)
Sets the configuration of
PredictiveModel to that of the
input model. |
void |
setMergeCategoriesSignificanceLevel(double mergeAlpha)
Sets the significance level for merging categories.
|
void |
setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
Sets the significance level for splitting previously merged categories.
|
void |
setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
Sets the significance level for split variable selection.
|
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinCostComplexityValue, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelection
getClassCounts, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
public CHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
CHAID
object for a single response variable and
multiple predictor variables.xy
- a double
matrix containing the training data and
associated response valuesresponseColumnIndex
- an int
specifying the column
index of the response variablevarType
- a PredictiveModel.VariableType
array containing the type of each variablepublic CHAID(CHAID chaidModel)
CHAID
decision tree.chaidModel
- a CHAID
decision treepublic CHAID clone()
CHAID
decision tree.clone
in class PredictiveModel
CHAID
decision treepublic double getMergeCategoriesSigLevel()
double
, the significance level for merging
categoriespublic final void setMergeCategoriesSignificanceLevel(double mergeAlpha)
mergeAlpha
- a double
, specifying the significance
level for merging categories
mergeAlpha
must be between 0.0 and 1.0. In addition, if
splitMergeAlpha
is set to enable splitting of previously
merged categories, then mergeAlpha
\( \le
\)
splitMergeAlpha
.
Default: mergeAlpha
= 0.05.
public double getSplitMergedCategoriesSigLevel()
double
, the significance level for splitting
merged categoriespublic final void setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
splitMergedAlpha
- a double
specifying the significance
level for splitting merged categories
splitMergeAlpha
must be greater than or equal to
CHAID.getMergeCategoriesSigLevel()
unless disabled using
splitMergeAlpha=-1.
Default: splitMergeAlpha
= -1.0 disables splitting of merged
categories.
public double getSplitVariableSignificanceLevel()
double
, the significance level for split variable
selectionpublic final void setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
splitVariableSelectionAlpha
- a double
specifying the
significance level for split variable selection
splitVariableSelectionAlpha
must be between 0.0 and 1.0.
Default: splitVariableSelectionAlpha
= 0.05.
protected final void setConfiguration(PredictiveModel pm)
PredictiveModel
to that of the
input model.setConfiguration
in class DecisionTree
pm
- a PredictiveModel
object which is to have its
attributes duplicated in this instanceprotected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)
selectSplitVariable
in class DecisionTree
xy
- a double
matrix containing the dataclassCounts
- a double
array containing the counts for
each class of the response variable, when it is categoricalparentFreq
- a double
array used to indicate which
subset of the observations belong in the current nodesplitValue
- a double
array representing the resulting
split point if the selected variable is quantitativesplitCriterionValue
- a double
, the value of the
criterion used to determine the splitting variablesplitPartition
- an int
array indicating the resulting
split partition if the selected variable is categoricalint
specifying the column index of the split
variable in this.getPredictorIndexes()
Copyright © 2020 Rogue Wave Software. All rights reserved.