Class CHAID

All Implemented Interfaces:
Serializable, Cloneable

public class CHAID extends DecisionTree implements Serializable, Cloneable

Generates a decision tree using CHAID for categorical or discrete ordered predictor variables. Due to Kass (1980), CHAID is an acronym for chi-square automatic interaction detection. At each node, CHAID looks for the best splitting variable using the following steps: given a predictor variable X, perform a 2-way chi-squared test of association between each possible pair of categories of X with the categories of Y. The least significant result is noted and, if a threshold is met, the two categories of X are merged.

Next, treating this merged category as a single category, CHAID repeats the series of tests to determine if there is further merging possible. If a merged category consists of three or more of the original categories of X, CHAID calls for a step to test whether the merged categories should be split. This is done by forming all binary partitions of the merged category and testing each one against Y in a 2-way test of association. If the most significant result meets a threshold, then the merged category is split accordingly. As long as the threshold in this step is smaller than the threshold in the merge step, the splitting step and the merge step will not cycle back and forth.

Once each predictor is processed in this manner, the predictor with the most significant qualifying 2-way test with Y is selected as the splitting variable, and its last state of merged categories defines the split at the given node. If none of the tests qualify (by having an adjusted p-value smaller than a threshold), then the node is not split. This growing procedure continues until one or more stopping conditions are met.

See Also:
  • Constructor Details

    • CHAID

      public CHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
      Constructs a CHAID object for a single response variable and multiple predictor variables.
      Parameters:
      xy - a double matrix containing the training data and associated response values
      responseColumnIndex - an int specifying the column index of the response variable
      varType - a PredictiveModel.VariableType array containing the type of each variable
    • CHAID

      public CHAID(CHAID chaidModel)
      Constructs a copy of the input CHAID decision tree.
      Parameters:
      chaidModel - a CHAID decision tree
  • Method Details

    • clone

      public CHAID clone()
      Clones a CHAID decision tree.
      Specified by:
      clone in class PredictiveModel
      Returns:
      a clone of the CHAID decision tree
    • getMergeCategoriesSigLevel

      public double getMergeCategoriesSigLevel()
      Returns the significance level for merging categories.
      Returns:
      a double, the significance level for merging categories
    • setMergeCategoriesSignificanceLevel

      public final void setMergeCategoriesSignificanceLevel(double mergeAlpha)
      Sets the significance level for merging categories.
      Parameters:
      mergeAlpha - a double, specifying the significance level for merging categories

      mergeAlpha must be between 0.0 and 1.0. In addition, if splitMergeAlpha is set to enable splitting of previously merged categories, then mergeAlpha \( \le \) splitMergeAlpha.

      Default: mergeAlpha = 0.05.

    • getSplitMergedCategoriesSigLevel

      public double getSplitMergedCategoriesSigLevel()
      Returns the significance level for splitting previously merged categories.
      Returns:
      a double, the significance level for splitting merged categories
    • setSplitMergedCategoriesSigLevel

      public final void setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
      Sets the significance level for splitting previously merged categories.
      Parameters:
      splitMergedAlpha - a double specifying the significance level for splitting merged categories

      splitMergeAlpha must be greater than or equal to getMergeCategoriesSigLevel() unless disabled using splitMergeAlpha=-1. Default: splitMergeAlpha = -1.0 disables splitting of merged categories.

    • getSplitVariableSignificanceLevel

      public double getSplitVariableSignificanceLevel()
      Returns the significance level for split variable selection.
      Returns:
      a double, the significance level for split variable selection
    • setSplitVariableSignificanceLevel

      public final void setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
      Sets the significance level for split variable selection.
      Parameters:
      splitVariableSelectionAlpha - a double specifying the significance level for split variable selection

      splitVariableSelectionAlpha must be between 0.0 and 1.0.

      Default: splitVariableSelectionAlpha = 0.05.

    • setConfiguration

      protected final void setConfiguration(PredictiveModel pm)
      Sets the configuration of PredictiveModel to that of the input model.
      Overrides:
      setConfiguration in class DecisionTree
      Parameters:
      pm - a PredictiveModel object which is to have its attributes duplicated in this instance
    • selectSplitVariable

      protected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)
      Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).
      Specified by:
      selectSplitVariable in class DecisionTree
      Parameters:
      xy - a double matrix containing the data
      classCounts - a double array containing the counts for each class of the response variable, when it is categorical
      parentFreq - a double array used to indicate which subset of the observations belong in the current node
      splitValue - a double array representing the resulting split point if the selected variable is quantitative
      splitCriterionValue - a double, the value of the criterion used to determine the splitting variable
      splitPartition - an int array indicating the resulting split partition if the selected variable is categorical
      Returns:
      an int specifying the column index of the split variable in this.getPredictorIndexes()