public class QUEST extends DecisionTree implements Serializable, Cloneable
Generates a decision tree using the QUEST algorithm for a categorical
response variable and categorical or quantitative predictor variables. The
procedure (Loh and Shih, 1997) is as follows: For each categorical predictor,
QUEST
performs a multi-way chi-square test of association
between the predictor and Y. For every continuous predictor,
QUEST
performs an ANOVA test to see if the means of the
predictor vary among the groups of Y. Among these tests, the variable
with the most significant result is selected as a potential splitting
variable, say, Xj. If the p-value (adjusted for multiple
tests) is less than the specified splitting threshold, then
Xj is the splitting variable for the current node. If not,
QUEST
performs for each continuous variable X a Levene's
test of homogeneity to see if the variance of X varies within the
different groups of Y. Among these tests, we again find the predictor
with the most significant result, say Xi. If its p-value
(adjusted for multiple tests) is less than the splitting threshold,
Xi is the splitting variable. Otherwise, the node is not
split.
Assuming a splitting variable is found, the next step is to determine how the variable should be split. If the selected variable Xj is continuous, a split point d is determined by quadratic discriminant analysis (QDA) of Xj into two populations determined by a binary partition of the response Y. The goal of this step is to group the classes of Y into two subsets or super classes, A and B. If there are only two classes in the response Y, the super classes are obvious. Otherwise, calculate the means and variances of Xj in each of the classes of Y. If the means are all equal, put the largest-sized class into group A and combine the rest to form group B. If they are not all equal, use a k-means clustering method (k = 2) on the class means to determine A and B.
Xj in A and in B is assumed to be normally distributed with estimated means \(\bar{x}_{j|A}\), \(\bar{x}_{j|B}\), and variances S2j|A, S2j|B, respectively. The quadratic discriminant is the partition \(X_j\le d\) and \(X_j\gt d\) such that \(\mbox{Pr}\left(X_j,A\right)=\mbox{Pr}\left(X_j,B\right) \). The discriminant rule assigns an observation to A if \(x_{ij}\le d\) and to B if \(x_{ij}\gt d \). For d to maximally discriminate, the probabilities must be equal.
If the selected variable Xj is categorical, it is first transformed using the method outlined in Loh and Shih (1997) and then QDA is performed as above. The transformation is related to the discriminant coordinate (CRIMCOORD) approach due to Gnanadesikan (1977).
DecisionTree.MaxTreeSizeExceededException, DecisionTree.PruningFailedToConvergeException, DecisionTree.PureNodeException
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor and Description |
---|
QUEST(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
QUEST object for a single response variable
and multiple predictor variables. |
QUEST(QUEST questModel)
Constructs a copy of the input
QUEST decision tree. |
Modifier and Type | Method and Description |
---|---|
QUEST |
clone()
Clones a
QUEST decision tree. |
double |
getSplitVariableSelectionCriterion()
Returns the significance level for split variable selection.
|
protected int |
selectSplitVariable(double[][] xy,
double[] classCounts,
double[] parentFreq,
double[] splitValue,
double[] splitCriterionValue,
int[] splitPartition)
Selects the split variable for the present node using the QUEST method.
|
protected void |
setConfiguration(PredictiveModel pm)
Sets the configuration of
PredictiveModel to that of the
input model. |
void |
setSplitVariableSelectionCriterion(double criterion)
Sets the significance level for split variable selection.
|
fitModel, getCostComplexityValues, getDecisionTree, getFittedMeanSquaredError, getMaxDepth, getMaxNodes, getMeanSquaredPredictionError, getMinCostComplexityValue, getMinObsPerChildNode, getMinObsPerNode, getNodeAssigments, getNumberOfComplexityValues, getNumberOfRandomFeatures, isAutoPruningFlag, isRandomFeatureSelection, predict, predict, predict, printDecisionTree, printDecisionTree, pruneTree, setAutoPruningFlag, setCostComplexityValues, setMaxDepth, setMaxNodes, setMinCostComplexityValue, setMinObsPerChildNode, setMinObsPerNode, setNumberOfRandomFeatures, setRandomFeatureSelection
getClassCounts, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
public QUEST(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
QUEST
object for a single response variable
and multiple predictor variables.xy
- a double
matrix with rows containing the
observations on the predictor variables and one response variableresponseColumnIndex
- an int
specifying the column
index of the response variablevarType
- a PredictiveModel.VariableType
array containing the type of each variablepublic QUEST(QUEST questModel)
QUEST
decision tree.questModel
- a QUEST
decision treepublic QUEST clone()
QUEST
decision tree.clone
in class PredictiveModel
QUEST
decision treepublic double getSplitVariableSelectionCriterion()
double
, the significance criterion for split
variable selectionpublic final void setSplitVariableSelectionCriterion(double criterion)
criterion
- a double
specifying the criterion for split
variable selection. criterion
must be between 0.0 and 1.0.
Default: criterion
= 0.05
protected final void setConfiguration(PredictiveModel pm)
PredictiveModel
to that of the
input model.setConfiguration
in class DecisionTree
pm
- a PredictiveModel
object which is to have its
attributes duplicated in this instanceprotected int selectSplitVariable(double[][] xy, double[] classCounts, double[] parentFreq, double[] splitValue, double[] splitCriterionValue, int[] splitPartition)
selectSplitVariable
in class DecisionTree
xy
- a double
matrix containing the dataclassCounts
- a double
array containing the counts for
each class of the response variable, when it is categoricalparentFreq
- a double
array used to determine which
subset of the observations belong in the current nodesplitValue
- a double
array representing the resulting
split point if the selected variable is quantitativesplitCriterionValue
- a double
, the value of the
criterion used to determine the splitting variablesplitPartition
- an int
array indicating the resulting
split partition if the selected variable is categoricalint
specifying the column index of the split
variable in xy
Copyright © 2020 Rogue Wave Software. All rights reserved.