com.imsl.datamining.decisionTree.QUEST

All Implemented Interfaces:: Serializable, Cloneable

public class QUEST extends DecisionTree implements Serializable, Cloneable

Generates a decision tree using the QUEST algorithm for a categorical response variable and categorical or quantitative predictor variables. The procedure (Loh and Shih, 1997) is as follows: For each categorical predictor, QUEST performs a multi-way chi-square test of association between the predictor and Y. For every continuous predictor, QUEST performs an ANOVA test to see if the means of the predictor vary among the groups of Y. Among these tests, the variable with the most significant result is selected as a potential splitting variable, say, X_j. If the p-value (adjusted for multiple tests) is less than the specified splitting threshold, then X_j is the splitting variable for the current node. If not, QUEST performs for each continuous variable X a Levene's test of homogeneity to see if the variance of X varies within the different groups of Y. Among these tests, we again find the predictor with the most significant result, say X_i. If its p-value (adjusted for multiple tests) is less than the splitting threshold, X_i is the splitting variable. Otherwise, the node is not split.

Assuming a splitting variable is found, the next step is to determine how the variable should be split. If the selected variable X_j is continuous, a split point d is determined by quadratic discriminant analysis (QDA) of X_j into two populations determined by a binary partition of the response Y. The goal of this step is to group the classes of Y into two subsets or super classes, A and B. If there are only two classes in the response Y, the super classes are obvious. Otherwise, calculate the means and variances of X_j in each of the classes of Y. If the means are all equal, put the largest-sized class into group A and combine the rest to form group B. If they are not all equal, use a k-means clustering method (k = 2) on the class means to determine A and B.

X_j in A and in B is assumed to be normally distributed with estimated means \(\bar{x}_{j|A}\), \(\bar{x}_{j|B}\), and variances S²_j|A, S²_j|B, respectively. The quadratic discriminant is the partition \(X_j\le d\) and \(X_j\gt d\) such that \(\mbox{Pr}\left(X_j,A\right)=\mbox{Pr}\left(X_j,B\right) \). The discriminant rule assigns an observation to A if \(x_{ij}\le d\) and to B if \(x_{ij}\gt d \). For d to maximally discriminate, the probabilities must be equal.

If the selected variable X_j is categorical, it is first transformed using the method outlined in Loh and Shih (1997) and then QDA is performed as above. The transformation is related to the discriminant coordinate (CRIMCOORD) approach due to Gnanadesikan (1977).

See Also:

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor Summary

Constructors

Constructor

Description

QUEST(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)

Constructs a QUEST object for a single response variable and multiple predictor variables.

QUEST(QUEST questModel)

Constructs a copy of the input QUEST decision tree.
Method Summary

Modifier and Type

Method

Description

QUEST

clone()

Clones a QUEST decision tree.

double

getSplitVariableSelectionCriterion()

Returns the significance level for split variable selection.

protected int

selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)

Selects the split variable for the present node using the QUEST method.

protected final void

setConfiguration(PredictiveModel pm)

Sets the configuration of PredictiveModel to that of the input model.

final void

setSplitVariableSelectionCriterion(double criterion)

Sets the significance level for split variable selection.

Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinCostComplexityValue, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelection

Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- QUEST
  
  public QUEST(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
  
  Constructs a QUEST object for a single response variable and multiple predictor variables.
  
  Parameters:
  
  xy - a double matrix with rows containing the observations on the predictor variables and one response variable
  
  responseColumnIndex - an int specifying the column index of the response variable
  
  varType - a PredictiveModel.VariableType array containing the type of each variable
- QUEST
  
  public QUEST(QUEST questModel)
  
  Constructs a copy of the input QUEST decision tree.
  
  Parameters:
  
  questModel - a QUEST decision tree
Method Details
- clone
  
  public QUEST clone()
  
  Clones a QUEST decision tree.
  
  Specified by:
  
  clone in class PredictiveModel
  
  Returns:
  
  a clone of the QUEST decision tree
- getSplitVariableSelectionCriterion
  
  public double getSplitVariableSelectionCriterion()
  
  Returns the significance level for split variable selection.
  
  Returns:
  
  a double, the significance criterion for split variable selection
- setSplitVariableSelectionCriterion
  
  public final void setSplitVariableSelectionCriterion(double criterion)
  
  Sets the significance level for split variable selection.
  
  Parameters:
  
  criterion - a double specifying the criterion for split variable selection. criterion must be between 0.0 and 1.0.
  Default: criterion = 0.05
- setConfiguration
  
  protected final void setConfiguration(PredictiveModel pm)
  
  Sets the configuration of PredictiveModel to that of the input model.
  
  Overrides:
  
  setConfiguration in class DecisionTree
  
  Parameters:
  
  pm - a PredictiveModel object which is to have its attributes duplicated in this instance
- selectSplitVariable
  
  protected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)
  
  Selects the split variable for the present node using the QUEST method.
  
  Specified by:
  
  selectSplitVariable in class DecisionTree
  
  Parameters:
  
  xy - a double matrix containing the data
  
  classCounts - a double array containing the counts for each class of the response variable, when it is categorical
  
  parentFreq - a double array used to determine which subset of the observations belong in the current node
  
  splitValue - a double array representing the resulting split point if the selected variable is quantitative
  
  splitCriterionValue - a double, the value of the criterion used to determine the splitting variable
  
  splitPartition - an int array indicating the resulting split partition if the selected variable is categorical
  
  Returns:
  
  an int specifying the column index of the split variable in xy

Class QUEST

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel

Constructor Summary

Method Summary

Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree

Methods inherited from class com.imsl.datamining.PredictiveModel

Methods inherited from class java.lang.Object

Constructor Details

QUEST

QUEST

Method Details

clone

getSplitVariableSelectionCriterion

setSplitVariableSelectionCriterion

setConfiguration

selectSplitVariable