Class QUEST
- All Implemented Interfaces:
Serializable,Cloneable
Generates a decision tree using the QUEST algorithm for a categorical
response variable and categorical or quantitative predictor variables. The
procedure (Loh and Shih, 1997) is as follows: For each categorical predictor,
QUEST performs a multi-way chi-square test of association
between the predictor and Y. For every continuous predictor,
QUEST performs an ANOVA test to see if the means of the
predictor vary among the groups of Y. Among these tests, the variable
with the most significant result is selected as a potential splitting
variable, say, Xj. If the p-value (adjusted for multiple
tests) is less than the specified splitting threshold, then
Xj is the splitting variable for the current node. If not,
QUEST performs for each continuous variable X a Levene's
test of homogeneity to see if the variance of X varies within the
different groups of Y. Among these tests, we again find the predictor
with the most significant result, say Xi. If its p-value
(adjusted for multiple tests) is less than the splitting threshold,
Xi is the splitting variable. Otherwise, the node is not
split.
Assuming a splitting variable is found, the next step is to determine how the variable should be split. If the selected variable Xj is continuous, a split point d is determined by quadratic discriminant analysis (QDA) of Xj into two populations determined by a binary partition of the response Y. The goal of this step is to group the classes of Y into two subsets or super classes, A and B. If there are only two classes in the response Y, the super classes are obvious. Otherwise, calculate the means and variances of Xj in each of the classes of Y. If the means are all equal, put the largest-sized class into group A and combine the rest to form group B. If they are not all equal, use a k-means clustering method (k = 2) on the class means to determine A and B.
Xj in A and in B is assumed to be normally distributed with estimated means \(\bar{x}_{j|A}\), \(\bar{x}_{j|B}\), and variances S2j|A, S2j|B, respectively. The quadratic discriminant is the partition \(X_j\le d\) and \(X_j\gt d\) such that \(\mbox{Pr}\left(X_j,A\right)=\mbox{Pr}\left(X_j,B\right) \). The discriminant rule assigns an observation to A if \(x_{ij}\le d\) and to B if \(x_{ij}\gt d \). For d to maximally discriminate, the probabilities must be equal.
If the selected variable Xj is categorical, it is first transformed using the method outlined in Loh and Shih (1997) and then QDA is performed as above. The transformation is related to the discriminant coordinate (CRIMCOORD) approach due to Gnanadesikan (1977).
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class com.imsl.datamining.decisionTree.DecisionTree
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeExceptionNested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType -
Constructor Summary
ConstructorsConstructorDescriptionQUEST(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs aQUESTobject for a single response variable and multiple predictor variables.Constructs a copy of the inputQUESTdecision tree. -
Method Summary
Modifier and TypeMethodDescriptionclone()Clones aQUESTdecision tree.doubleReturns the significance level for split variable selection.protected intselectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition) Selects the split variable for the present node using the QUEST method.protected final voidSets the configuration ofPredictiveModelto that of the input model.final voidsetSplitVariableSelectionCriterion(double criterion) Sets the significance level for split variable selection.Methods inherited from class com.imsl.datamining.decisionTree.DecisionTree
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinCostComplexityValue, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelectionMethods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
-
Constructor Details
-
QUEST
Constructs aQUESTobject for a single response variable and multiple predictor variables.- Parameters:
xy- adoublematrix with rows containing the observations on the predictor variables and one response variableresponseColumnIndex- anintspecifying the column index of the response variablevarType- aPredictiveModel.VariableTypearray containing the type of each variable
-
QUEST
Constructs a copy of the inputQUESTdecision tree.- Parameters:
questModel- aQUESTdecision tree
-
-
Method Details
-
clone
Clones aQUESTdecision tree.- Specified by:
clonein classPredictiveModel- Returns:
- a clone of the
QUESTdecision tree
-
getSplitVariableSelectionCriterion
public double getSplitVariableSelectionCriterion()Returns the significance level for split variable selection.- Returns:
- a
double, the significance criterion for split variable selection
-
setSplitVariableSelectionCriterion
public final void setSplitVariableSelectionCriterion(double criterion) Sets the significance level for split variable selection.- Parameters:
criterion- adoublespecifying the criterion for split variable selection.criterionmust be between 0.0 and 1.0.Default:
criterion= 0.05
-
setConfiguration
Sets the configuration ofPredictiveModelto that of the input model.- Overrides:
setConfigurationin classDecisionTree- Parameters:
pm- aPredictiveModelobject which is to have its attributes duplicated in this instance
-
selectSplitVariable
protected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition) Selects the split variable for the present node using the QUEST method.- Specified by:
selectSplitVariablein classDecisionTree- Parameters:
xy- adoublematrix containing the dataclassCounts- adoublearray containing the counts for each class of the response variable, when it is categoricalparentFreq- adoublearray used to determine which subset of the observations belong in the current nodesplitValue- adoublearray representing the resulting split point if the selected variable is quantitativesplitCriterionValue- adouble, the value of the criterion used to determine the splitting variablesplitPartition- anintarray indicating the resulting split partition if the selected variable is categorical- Returns:
- an
intspecifying the column index of the split variable inxy
-