Class CHAID
- All Implemented Interfaces:
Serializable,Cloneable
Generates a decision tree using CHAID for categorical or discrete ordered
predictor variables. Due to Kass (1980), CHAID is an acronym for chi-square
automatic interaction detection. At each node, CHAID looks for
the best splitting variable using the following steps: given a predictor
variable X, perform a 2-way chi-squared test of association between
each possible pair of categories of X with the categories of Y.
The least significant result is noted and, if a threshold is met, the two
categories of X are merged.
Next, treating this merged category as a single category, CHAID repeats the
series of tests to determine if there is further merging possible. If a
merged category consists of three or more of the original categories of
X, CHAID calls for a step to test whether the merged
categories should be split. This is done by forming all binary partitions of
the merged category and testing each one against
Y in a 2-way test of association. If the most significant result meets
a threshold, then the merged category is split accordingly. As long as the
threshold in this step is smaller than the threshold in the merge step, the
splitting step and the merge step will not cycle back and forth.
Once each predictor is processed in this manner, the predictor with the most significant qualifying 2-way test with Y is selected as the splitting variable, and its last state of merged categories defines the split at the given node. If none of the tests qualify (by having an adjusted p-value smaller than a threshold), then the node is not split. This growing procedure continues until one or more stopping conditions are met.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeExceptionNested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType -
Constructor Summary
ConstructorsConstructorDescriptionCHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs aCHAIDobject for a single response variable and multiple predictor variables.Constructs a copy of the inputCHAIDdecision tree. -
Method Summary
Modifier and TypeMethodDescriptionclone()Clones aCHAIDdecision tree.doubleReturns the significance level for merging categories.doubleReturns the significance level for splitting previously merged categories.doubleReturns the significance level for split variable selection.protected intselectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition) Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).protected final voidSets the configuration ofPredictiveModelto that of the input model.final voidsetMergeCategoriesSignificanceLevel(double mergeAlpha) Sets the significance level for merging categories.final voidsetSplitMergedCategoriesSigLevel(double splitMergedAlpha) Sets the significance level for splitting previously merged categories.final voidsetSplitVariableSignificanceLevel(double splitVariableSelectionAlpha) Sets the significance level for split variable selection.Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinCostComplexityValue, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelectionMethods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
-
Constructor Details
-
CHAID
Constructs aCHAIDobject for a single response variable and multiple predictor variables.- Parameters:
xy- adoublematrix containing the training data and associated response valuesresponseColumnIndex- anintspecifying the column index of the response variablevarType- aPredictiveModel.VariableTypearray containing the type of each variable
-
CHAID
Constructs a copy of the inputCHAIDdecision tree.- Parameters:
chaidModel- aCHAIDdecision tree
-
-
Method Details
-
clone
Clones aCHAIDdecision tree.- Specified by:
clonein classPredictiveModel- Returns:
- a clone of the
CHAIDdecision tree
-
getMergeCategoriesSigLevel
public double getMergeCategoriesSigLevel()Returns the significance level for merging categories.- Returns:
- a
double, the significance level for merging categories
-
setMergeCategoriesSignificanceLevel
public final void setMergeCategoriesSignificanceLevel(double mergeAlpha) Sets the significance level for merging categories.- Parameters:
mergeAlpha- adouble, specifying the significance level for merging categoriesmergeAlphamust be between 0.0 and 1.0. In addition, ifsplitMergeAlphais set to enable splitting of previously merged categories, thenmergeAlpha\( \le \)splitMergeAlpha.Default:
mergeAlpha= 0.05.
-
getSplitMergedCategoriesSigLevel
public double getSplitMergedCategoriesSigLevel()Returns the significance level for splitting previously merged categories.- Returns:
- a
double, the significance level for splitting merged categories
-
setSplitMergedCategoriesSigLevel
public final void setSplitMergedCategoriesSigLevel(double splitMergedAlpha) Sets the significance level for splitting previously merged categories.- Parameters:
splitMergedAlpha- adoublespecifying the significance level for splitting merged categoriessplitMergeAlphamust be greater than or equal togetMergeCategoriesSigLevel()unless disabled usingsplitMergeAlpha=-1.Default:splitMergeAlpha= -1.0 disables splitting of merged categories.
-
getSplitVariableSignificanceLevel
public double getSplitVariableSignificanceLevel()Returns the significance level for split variable selection.- Returns:
- a
double, the significance level for split variable selection
-
setSplitVariableSignificanceLevel
public final void setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha) Sets the significance level for split variable selection.- Parameters:
splitVariableSelectionAlpha- adoublespecifying the significance level for split variable selectionsplitVariableSelectionAlphamust be between 0.0 and 1.0.Default:
splitVariableSelectionAlpha= 0.05.
-
setConfiguration
Sets the configuration ofPredictiveModelto that of the input model.- Overrides:
setConfigurationin classDecisionTree- Parameters:
pm- aPredictiveModelobject which is to have its attributes duplicated in this instance
-
selectSplitVariable
protected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition) Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).- Specified by:
selectSplitVariablein classDecisionTree- Parameters:
xy- adoublematrix containing the dataclassCounts- adoublearray containing the counts for each class of the response variable, when it is categoricalparentFreq- adoublearray used to indicate which subset of the observations belong in the current nodesplitValue- adoublearray representing the resulting split point if the selected variable is quantitativesplitCriterionValue- adouble, the value of the criterion used to determine the splitting variablesplitPartition- anintarray indicating the resulting split partition if the selected variable is categorical- Returns:
- an
intspecifying the column index of the split variable inthis.getPredictorIndexes()
-