com.imsl.datamining.decisionTree.CHAID

All Implemented Interfaces:: Serializable, Cloneable

public class CHAID extends DecisionTree implements Serializable, Cloneable

Generates a decision tree using CHAID for categorical or discrete ordered predictor variables. Due to Kass (1980), CHAID is an acronym for chi-square automatic interaction detection. At each node, CHAID looks for the best splitting variable using the following steps: given a predictor variable X, perform a 2-way chi-squared test of association between each possible pair of categories of X with the categories of Y. The least significant result is noted and, if a threshold is met, the two categories of X are merged.

Next, treating this merged category as a single category, CHAID repeats the series of tests to determine if there is further merging possible. If a merged category consists of three or more of the original categories of X, CHAID calls for a step to test whether the merged categories should be split. This is done by forming all binary partitions of the merged category and testing each one against Y in a 2-way test of association. If the most significant result meets a threshold, then the merged category is split accordingly. As long as the threshold in this step is smaller than the threshold in the merge step, the splitting step and the merge step will not cycle back and forth.

Once each predictor is processed in this manner, the predictor with the most significant qualifying 2-way test with Y is selected as the splitting variable, and its last state of merged categories defines the split at the given node. If none of the tests qualify (by having an adjusted p-value smaller than a threshold), then the node is not split. This growing procedure continues until one or more stopping conditions are met.

See Also:

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor Summary

Constructors

Constructor

Description

CHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)

Constructs a CHAID object for a single response variable and multiple predictor variables.

CHAID(CHAID chaidModel)

Constructs a copy of the input CHAID decision tree.
Method Summary

Modifier and Type

Method

Description

CHAID

clone()

Clones a CHAID decision tree.

double

getMergeCategoriesSigLevel()

Returns the significance level for merging categories.

double

getSplitMergedCategoriesSigLevel()

Returns the significance level for splitting previously merged categories.

double

getSplitVariableSignificanceLevel()

Returns the significance level for split variable selection.

protected int

selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)

Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).

protected final void

setConfiguration(PredictiveModel pm)

Sets the configuration of PredictiveModel to that of the input model.

final void

setMergeCategoriesSignificanceLevel(double mergeAlpha)

Sets the significance level for merging categories.

final void

setSplitMergedCategoriesSigLevel(double splitMergedAlpha)

Sets the significance level for splitting previously merged categories.

final void

setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)

Sets the significance level for split variable selection.

Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinCostComplexityValue, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelection

Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- CHAID
  
  public CHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
  
  Constructs a CHAID object for a single response variable and multiple predictor variables.
  
  Parameters:
  
  xy - a double matrix containing the training data and associated response values
  
  responseColumnIndex - an int specifying the column index of the response variable
  
  varType - a PredictiveModel.VariableType array containing the type of each variable
- CHAID
  
  public CHAID(CHAID chaidModel)
  
  Constructs a copy of the input CHAID decision tree.
  
  Parameters:
  
  chaidModel - a CHAID decision tree
Method Details
- clone
  
  public CHAID clone()
  
  Clones a CHAID decision tree.
  
  Specified by:
  
  clone in class PredictiveModel
  
  Returns:
  
  a clone of the CHAID decision tree
- getMergeCategoriesSigLevel
  
  public double getMergeCategoriesSigLevel()
  
  Returns the significance level for merging categories.
  
  Returns:
  
  a double, the significance level for merging categories
- setMergeCategoriesSignificanceLevel
  
  public final void setMergeCategoriesSignificanceLevel(double mergeAlpha)
  
  Sets the significance level for merging categories.
  
  Parameters:
  
  mergeAlpha - a double, specifying the significance level for merging categories
  mergeAlpha must be between 0.0 and 1.0. In addition, if splitMergeAlpha is set to enable splitting of previously merged categories, then mergeAlpha \( \le \) splitMergeAlpha.
  
  Default: mergeAlpha = 0.05.
- getSplitMergedCategoriesSigLevel
  
  public double getSplitMergedCategoriesSigLevel()
  
  Returns the significance level for splitting previously merged categories.
  
  Returns:
  
  a double, the significance level for splitting merged categories
- setSplitMergedCategoriesSigLevel
  
  public final void setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
  
  Sets the significance level for splitting previously merged categories.
  
  Parameters:
  
  splitMergedAlpha - a double specifying the significance level for splitting merged categories
  splitMergeAlpha must be greater than or equal to getMergeCategoriesSigLevel() unless disabled using splitMergeAlpha=-1. Default: splitMergeAlpha = -1.0 disables splitting of merged categories.
- getSplitVariableSignificanceLevel
  
  public double getSplitVariableSignificanceLevel()
  
  Returns the significance level for split variable selection.
  
  Returns:
  
  a double, the significance level for split variable selection
- setSplitVariableSignificanceLevel
  
  public final void setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
  
  Sets the significance level for split variable selection.
  
  Parameters:
  
  splitVariableSelectionAlpha - a double specifying the significance level for split variable selection
  splitVariableSelectionAlpha must be between 0.0 and 1.0.
  
  Default: splitVariableSelectionAlpha = 0.05.
- setConfiguration
  
  protected final void setConfiguration(PredictiveModel pm)
  
  Sets the configuration of PredictiveModel to that of the input model.
  
  Overrides:
  
  setConfiguration in class DecisionTree
  
  Parameters:
  
  pm - a PredictiveModel object which is to have its attributes duplicated in this instance
- selectSplitVariable
  
  protected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)
  
  Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).
  
  Specified by:
  
  selectSplitVariable in class DecisionTree
  
  Parameters:
  
  xy - a double matrix containing the data
  
  classCounts - a double array containing the counts for each class of the response variable, when it is categorical
  
  parentFreq - a double array used to indicate which subset of the observations belong in the current node
  
  splitValue - a double array representing the resulting split point if the selected variable is quantitative
  
  splitCriterionValue - a double, the value of the criterion used to determine the splitting variable
  
  splitPartition - an int array indicating the resulting split partition if the selected variable is categorical
  
  Returns:
  
  an int specifying the column index of the split variable in this.getPredictorIndexes()

Class CHAID

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel

Constructor Summary

Method Summary

Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree

Methods inherited from class com.imsl.datamining.PredictiveModel

Methods inherited from class java.lang.Object

Constructor Details

CHAID

CHAID

Method Details

clone

getMergeCategoriesSigLevel

setMergeCategoriesSignificanceLevel

getSplitMergedCategoriesSigLevel

setSplitMergedCategoriesSigLevel

getSplitVariableSignificanceLevel

setSplitVariableSignificanceLevel

setConfiguration

selectSplitVariable