Class RandomTrees
- All Implemented Interfaces:
Serializable,Cloneable
A random forest is an ensemble of decision trees. Like bootstrap aggregation, a tree is fit to each of M bootstrap samples from the training data. Each tree is then used to generate predictions. For a regression problem (continuous response variable), the M predictions are combined into a single predicted value by averaging. For classification (categorical response variable), majority vote is used. A random forest also randomizes the predictors. That is, in every tree, the splitting variable at every node is selected from a random subset of the predictors. Randomization of the predictors reduces correlation among individual trees. The random forest was invented by Leo Breiman in 2001 (Breiman, 2001). Random ForestsTM is the trademark term for this approach. Also see Hastie, Tibshirani, and Friedman, 2008, for further discussion.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classClass that wraps exceptions thrown by reflective operations in core reflection.Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType -
Constructor Summary
ConstructorsConstructorDescriptionRandomTrees(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs aRandomTreesrandom forest ofALACARTdecision trees.Constructs aRandomTreesrandom forest of the input decision tree.RandomTrees(RandomTrees rtModel) Constructs a copy of the inputRandomTreespredictive model. -
Method Summary
Modifier and TypeMethodDescriptionclone()Clones aRandomTreespredictive model.voidfitModel()Fits the random forest to the training data.intReturns the number of random features used in the splitting rules.intReturns the number of trees.doubleReturns the out-of-bag prediction error.double[]Returns the out-of-bag predicted values for the examples in the training data.double[]Returns the variable importance measure based on the out-of-bag prediction error.booleanReturns the current setting of the boolean to calculate variable importance.double[]predict()Returns the predicted values generated by the random forest on the training data.double[]predict(double[][] testData) Returns the predicted values on the input test data.double[]predict(double[][] testData, double[] testDataWeights) Returns the predicted values on the input test data and the test data weights.voidsetCalculateVariableImportance(boolean calculate) Sets the boolean to calculate variable importance.protected voidSets the configuration ofRandomTreesto that of the input model.voidsetNumberOfRandomFeatures(int numberOfRandomFeatures) Sets the number of random features used in the splitting rules.voidsetNumberOfThreads(int numberOfThreads) Sets the maximum number ofjava.lang.Threadinstances that may be used for parallel processing.voidsetNumberOfTrees(int numberOfTrees) Sets the number of trees to generate in the random forest.Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
-
Constructor Details
-
RandomTrees
Constructs aRandomTreesrandom forest ofALACARTdecision trees.- Parameters:
xy- adoublematrix containing the training dataresponseColumnIndex- anint, the column index for the response variablevarType- aPredictiveModel.VariableTypearray containing the type of each variable
-
RandomTrees
Constructs aRandomTreesrandom forest of the input decision tree.- Parameters:
dt- aDecisionTreeobject
-
RandomTrees
Constructs a copy of the inputRandomTreespredictive model.- Parameters:
rtModel- aRandomTreespredictive model
-
-
Method Details
-
clone
Clones aRandomTreespredictive model.- Specified by:
clonein classPredictiveModel- Returns:
- a clone of the
RandomTreespredictive model
-
setNumberOfTrees
public void setNumberOfTrees(int numberOfTrees) Sets the number of trees to generate in the random forest.The number of trees is equivalent to the number of bootstrap samples.
- Parameters:
numberOfTrees- anint, the number of trees to generateDefault: numberOfTrees=50
-
setNumberOfRandomFeatures
public void setNumberOfRandomFeatures(int numberOfRandomFeatures) Sets the number of random features used in the splitting rules.- Parameters:
numberOfRandomFeatures- anint, the number of predictors in the random subsetDefault:
numberOfRandomFeatures=\(\sqrt{p}\) for classification problems, \(\frac{p}{3}\) for regression problems, where \(p\) is the number of predictors in the training data.
-
getNumberOfRandomFeatures
public int getNumberOfRandomFeatures()Returns the number of random features used in the splitting rules.- Returns:
- an
int, the number of random features
-
setCalculateVariableImportance
public void setCalculateVariableImportance(boolean calculate) Sets the boolean to calculate variable importance.When
true, a permutation type variable importance measure is calculated during bootstrap aggregation.- Parameters:
calculate- abooleanindicating whether or not to calculate variable importanceDefault:
calculate= false
-
isCalculateVariableImportance
public boolean isCalculateVariableImportance()Returns the current setting of the boolean to calculate variable importance.- Returns:
- a
boolean, the current setting of the flag
-
getNumberOfTrees
public int getNumberOfTrees()Returns the number of trees.- Returns:
- an
int, the number of trees
-
setNumberOfThreads
public void setNumberOfThreads(int numberOfThreads) Sets the maximum number ofjava.lang.Threadinstances that may be used for parallel processing.- Parameters:
numberOfThreads- anintspecifying the maximum number ofjava.lang.Threadinstances that may be used for parallel processing.The actual number of threads used in parallel processing will be the lesser of
numberOfThreadsandnumberOfTrees, the number of trees in the random forest. This assessment is made to optimize use of resources.Default:
numberOfThreads= 1.
-
fitModel
Fits the random forest to the training data.- Overrides:
fitModelin classPredictiveModel- Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
-
setConfiguration
Sets the configuration ofRandomTreesto that of the input model.- Specified by:
setConfigurationin classPredictiveModel- Parameters:
pm- aRandomTreesobject- Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
-
predict
Returns the predicted values generated by the random forest on the training data.- Specified by:
predictin classPredictiveModel- Returns:
- a
doublearray containing the fitted values - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
-
predict
Returns the predicted values on the input test data.- Specified by:
predictin classPredictiveModel- Parameters:
testData- adoublematrix containing test dataNote:
testDatamust have the same number of columns asxyand the columns must be in the same arrangement as inxy.- Returns:
- a
doublearray containing the predicted values - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
-
predict
public double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException Returns the predicted values on the input test data and the test data weights.- Specified by:
predictin classPredictiveModel- Parameters:
testData- adoublematrix containing test datatestDataWeights- adoublearray containing weight values for each row oftestDataNote:
testDatamust have the same number of columns asxyand the columns must be in the same arrangement as inxy.- Returns:
- a
doublearray containing the predicted values - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
-
getOutOfBagPredictions
public double[] getOutOfBagPredictions()Returns the out-of-bag predicted values for the examples in the training data.- Returns:
- a
doublearray containing the out-of-bag predictions
-
getOutOfBagPredictionError
public double getOutOfBagPredictionError()Returns the out-of-bag prediction error.- Returns:
- a
double, the out-of-bag prediction error
-
getVariableImportance
public double[] getVariableImportance()Returns the variable importance measure based on the out-of-bag prediction error.Variable importance for a predictor is obtained by randomly permuting the out-of-bag values of the predictor and calculating the difference in predictive accuracy, before and after the permutation. The measure is averaged over all the trees.
- Returns:
- a
doublearray containing variable importance for each predictor
-