CHAID (JMSL Numerical Library (jmsl) 2021.0.0 API)

java.lang.Object
- com.imsl.datamining.PredictiveModel
- - com.imsl.datamining.decisionTree.DecisionTree
  - - com.imsl.datamining.decisionTree.CHAID

All Implemented Interfaces:

Serializable, Cloneable
```
public class CHAID
extends DecisionTree
implements Serializable, Cloneable
```
Generates a decision tree using CHAID for categorical or discrete ordered predictor variables. Due to Kass (1980), CHAID is an acronym for chi-square automatic interaction detection. At each node, CHAID looks for the best splitting variable using the following steps: given a predictor variable X, perform a 2-way chi-squared test of association between each possible pair of categories of X with the categories of Y. The least significant result is noted and, if a threshold is met, the two categories of X are merged.

Next, treating this merged category as a single category, CHAID repeats the series of tests to determine if there is further merging possible. If a merged category consists of three or more of the original categories of X, CHAID calls for a step to test whether the merged categories should be split. This is done by forming all binary partitions of the merged category and testing each one against Y in a 2-way test of association. If the most significant result meets a threshold, then the merged category is split accordingly. As long as the threshold in this step is smaller than the threshold in the merge step, the splitting step and the merge step will not cycle back and forth.

Once each predictor is processed in this manner, the predictor with the most significant qualifying 2-way test with Y is selected as the splitting variable, and its last state of merged categories defines the split at the given node. If none of the tests qualify (by having an adjusted p-value smaller than a threshold), then the node is not split. This growing procedure continues until one or more stopping conditions are met.

See Also:

Example, Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree
  DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException
- Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
  PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType

Constructor Summary

Constructors
Constructor and Description
`CHAID(CHAID chaidModel)` Constructs a copy of the input `CHAID` decision tree.
`CHAID(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)` Constructs a `CHAID` object for a single response variable and multiple predictor variables.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`CHAID`	`clone()` Clones a `CHAID` decision tree.
`double`	`getMergeCategoriesSigLevel()` Returns the significance level for merging categories.
`double`	`getSplitMergedCategoriesSigLevel()` Returns the significance level for splitting previously merged categories.
`double`	`getSplitVariableSignificanceLevel()` Returns the significance level for split variable selection.
`protected int`	`selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)` Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).
`protected void`	`setConfiguration(PredictiveModel pm)` Sets the configuration of `PredictiveModel` to that of the input model.
`void`	`setMergeCategoriesSignificanceLevel(double mergeAlpha)` Sets the significance level for merging categories.
`void`	`setSplitMergedCategoriesSigLevel(double splitMergedAlpha)` Sets the significance level for splitting previously merged categories.
`void`	`setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)` Sets the significance level for split variable selection.

Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - CHAID
```
public CHAID(double[][] xy,
             int responseColumnIndex,
             PredictiveModel.VariableType[] varType)
```
    Constructs a CHAID object for a single response variable and multiple predictor variables.
    
    Parameters:
    
    xy - a double matrix containing the training data and associated response values
    
    responseColumnIndex - an int specifying the column index of the response variable
    
    varType - a PredictiveModel.VariableType array containing the type of each variable
  - CHAID
```
public CHAID(CHAID chaidModel)
```
    Constructs a copy of the input CHAID decision tree.
    
    Parameters:
    
    chaidModel - a CHAID decision tree
- Method Detail
  - clone
```
public CHAID clone()
```
    Clones a CHAID decision tree.
    
    Specified by:
    
    clone in class PredictiveModel
    
    Returns:
    
    a clone of the CHAID decision tree
  - getMergeCategoriesSigLevel
```
public double getMergeCategoriesSigLevel()
```
    Returns the significance level for merging categories.
    
    Returns:
    
    a double, the significance level for merging categories
  - setMergeCategoriesSignificanceLevel
```
public final void setMergeCategoriesSignificanceLevel(double mergeAlpha)
```
    Sets the significance level for merging categories.
    
    Parameters:
    
    mergeAlpha - a double, specifying the significance level for merging categories
    mergeAlpha must be between 0.0 and 1.0. In addition, if splitMergeAlpha is set to enable splitting of previously merged categories, then mergeAlpha $\le$ splitMergeAlpha.
    
    Default: mergeAlpha = 0.05.
  - getSplitMergedCategoriesSigLevel
```
public double getSplitMergedCategoriesSigLevel()
```
    Returns the significance level for splitting previously merged categories.
    
    Returns:
    
    a double, the significance level for splitting merged categories
  - setSplitMergedCategoriesSigLevel
```
public final void setSplitMergedCategoriesSigLevel(double splitMergedAlpha)
```
    Sets the significance level for splitting previously merged categories.
    
    Parameters:
    
    splitMergedAlpha - a double specifying the significance level for splitting merged categories
    splitMergeAlpha must be greater than or equal to CHAID.getMergeCategoriesSigLevel() unless disabled using splitMergeAlpha=-1. Default: splitMergeAlpha = -1.0 disables splitting of merged categories.
  - getSplitVariableSignificanceLevel
```
public double getSplitVariableSignificanceLevel()
```
    Returns the significance level for split variable selection.
    
    Returns:
    
    a double, the significance level for split variable selection
  - setSplitVariableSignificanceLevel
```
public final void setSplitVariableSignificanceLevel(double splitVariableSelectionAlpha)
```
    Sets the significance level for split variable selection.
    
    Parameters:
    
    splitVariableSelectionAlpha - a double specifying the significance level for split variable selection
    splitVariableSelectionAlpha must be between 0.0 and 1.0.
    
    Default: splitVariableSelectionAlpha = 0.05.
  - setConfiguration
```
protected final void setConfiguration(PredictiveModel pm)
```
    Sets the configuration of PredictiveModel to that of the input model.
    
    Overrides:
    
    setConfiguration in class DecisionTree
    
    Parameters:
    
    pm - a PredictiveModel object which is to have its attributes duplicated in this instance
  - selectSplitVariable
```
protected int selectSplitVariable(double[][] xy,
                                  double[] classCounts,
                                  double[] parentFreq,
                                  double[] splitValue,
                                  double[] splitCriterionValue,
                                  int[] splitPartition)
```
    Selects the split variable for the current node using CHAID (chi-square automatic interaction detection).
    
    Specified by:
    
    selectSplitVariable in class DecisionTree
    
    Parameters:
    
    xy - a double matrix containing the data
    
    classCounts - a double array containing the counts for each class of the response variable, when it is categorical
    
    parentFreq - a double array used to indicate which subset of the observations belong in the current node
    
    splitValue - a double array representing the resulting split point if the selected variable is quantitative
    
    splitCriterionValue - a double, the value of the criterion used to determine the splitting variable
    
    splitPartition - an int array indicating the resulting split partition if the selected variable is categorical
    
    Returns:
    
    an int specifying the column index of the split variable in this.getPredictorIndexes()

Class CHAID

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel

Constructor Summary

Method Summary

Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree

Methods inherited from class com.imsl.datamining.PredictiveModel

Methods inherited from class java.lang.Object

Constructor Detail

CHAID

CHAID

Method Detail

clone

getMergeCategoriesSigLevel

setMergeCategoriesSignificanceLevel

getSplitMergedCategoriesSigLevel

setSplitMergedCategoriesSigLevel

getSplitVariableSignificanceLevel

setSplitVariableSignificanceLevel

setConfiguration

selectSplitVariable