com.imsl.datamining.decisionTree.RandomTrees

All Implemented Interfaces:: Serializable, Cloneable

public class RandomTrees extends PredictiveModel implements Serializable, Cloneable

Generates predictions using a random forest of decision trees.

A random forest is an ensemble of decision trees. Like bootstrap aggregation, a tree is fit to each of M bootstrap samples from the training data. Each tree is then used to generate predictions. For a regression problem (continuous response variable), the M predictions are combined into a single predicted value by averaging. For classification (categorical response variable), majority vote is used. A random forest also randomizes the predictors. That is, in every tree, the splitting variable at every node is selected from a random subset of the predictors. Randomization of the predictors reduces correlation among individual trees. The random forest was invented by Leo Breiman in 2001 (Breiman, 2001). Random Forests^TM is the trademark term for this approach. Also see Hastie, Tibshirani, and Friedman, 2008, for further discussion.

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

RandomTrees.ReflectiveOperationException

Class that wraps exceptions thrown by reflective operations in core reflection.

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor Summary

Constructors

Constructor

Description

RandomTrees(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)

Constructs a RandomTrees random forest of ALACART decision trees.

RandomTrees(DecisionTree dt)

Constructs a RandomTrees random forest of the input decision tree.

RandomTrees(RandomTrees rtModel)

Constructs a copy of the input RandomTrees predictive model.
Method Summary

Modifier and Type

Method

Description

RandomTrees

clone()

Clones a RandomTrees predictive model.

void

fitModel()

Fits the random forest to the training data.

int

getNumberOfRandomFeatures()

Returns the number of random features used in the splitting rules.

int

getNumberOfTrees()

Returns the number of trees.

double

getOutOfBagPredictionError()

Returns the out-of-bag prediction error.

double[]

getOutOfBagPredictions()

Returns the out-of-bag predicted values for the examples in the training data.

double[]

getVariableImportance()

Returns the variable importance measure based on the out-of-bag prediction error.

boolean

isCalculateVariableImportance()

Returns the current setting of the boolean to calculate variable importance.

double[]

predict()

Returns the predicted values generated by the random forest on the training data.

double[]

predict(double[][] testData)

Returns the predicted values on the input test data.

double[]

predict(double[][] testData, double[] testDataWeights)

Returns the predicted values on the input test data and the test data weights.

void

setCalculateVariableImportance(boolean calculate)

Sets the boolean to calculate variable importance.

protected void

setConfiguration(PredictiveModel pm)

Sets the configuration of RandomTrees to that of the input model.

void

setNumberOfRandomFeatures(int numberOfRandomFeatures)

Sets the number of random features used in the splitting rules.

void

setNumberOfThreads(int numberOfThreads)

Sets the maximum number of java.lang.Thread instances that may be used for parallel processing.

void

setNumberOfTrees(int numberOfTrees)

Sets the number of trees to generate in the random forest.

Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- RandomTrees
  
  public RandomTrees(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
  
  Constructs a RandomTrees random forest of ALACART decision trees.
  
  Parameters:
  
  xy - a double matrix containing the training data
  
  responseColumnIndex - an int, the column index for the response variable
  
  varType - a PredictiveModel.VariableType array containing the type of each variable
- RandomTrees
  
  public RandomTrees(DecisionTree dt)
  
  Constructs a RandomTrees random forest of the input decision tree.
  
  Parameters:
  
  dt - a DecisionTree object
- RandomTrees
  
  public RandomTrees(RandomTrees rtModel)
  
  Constructs a copy of the input RandomTrees predictive model.
  
  Parameters:
  
  rtModel - a RandomTrees predictive model
Method Details
- clone
  
  public RandomTrees clone()
  
  Clones a RandomTrees predictive model.
  
  Specified by:
  
  clone in class PredictiveModel
  
  Returns:
  
  a clone of the RandomTrees predictive model
- setNumberOfTrees
  
  public void setNumberOfTrees(int numberOfTrees)
  
  Sets the number of trees to generate in the random forest.
  The number of trees is equivalent to the number of bootstrap samples.
  
  Parameters:
  
  numberOfTrees - an int, the number of trees to generate
  Default: numberOfTrees=50
- setNumberOfRandomFeatures
  
  public void setNumberOfRandomFeatures(int numberOfRandomFeatures)
  
  Sets the number of random features used in the splitting rules.
  
  Parameters:
  
  numberOfRandomFeatures - an int, the number of predictors in the random subset
  Default: numberOfRandomFeatures=\(\sqrt{p}\) for classification problems, \(\frac{p}{3}\) for regression problems, where \(p\) is the number of predictors in the training data.
- getNumberOfRandomFeatures
  
  public int getNumberOfRandomFeatures()
  
  Returns the number of random features used in the splitting rules.
  
  Returns:
  
  an int, the number of random features
- setCalculateVariableImportance
  
  public void setCalculateVariableImportance(boolean calculate)
  
  Sets the boolean to calculate variable importance.
  When true, a permutation type variable importance measure is calculated during bootstrap aggregation.
  
  Parameters:
  
  calculate - a boolean indicating whether or not to calculate variable importance
  Default: calculate = false
- isCalculateVariableImportance
  
  public boolean isCalculateVariableImportance()
  
  Returns the current setting of the boolean to calculate variable importance.
  
  Returns:
  
  a boolean, the current setting of the flag
- getNumberOfTrees
  
  public int getNumberOfTrees()
  
  Returns the number of trees.
  
  Returns:
  
  an int, the number of trees
- setNumberOfThreads
  
  public void setNumberOfThreads(int numberOfThreads)
  
  Sets the maximum number of java.lang.Thread instances that may be used for parallel processing.
  
  Parameters:
  
  numberOfThreads - an int specifying the maximum number of java.lang.Thread instances that may be used for parallel processing.
  The actual number of threads used in parallel processing will be the lesser of numberOfThreads and numberOfTrees, the number of trees in the random forest. This assessment is made to optimize use of resources.
  
  Default: numberOfThreads = 1.
- fitModel
  
  public void fitModel() throws PredictiveModel.PredictiveModelException
  
  Fits the random forest to the training data.
  
  Overrides:
  
  fitModel in class PredictiveModel
  
  Throws:
  
  PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
- setConfiguration
  
  protected void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException
  
  Sets the configuration of RandomTrees to that of the input model.
  
  Specified by:
  
  setConfiguration in class PredictiveModel
  
  Parameters:
  
  pm - a RandomTrees object
  
  Throws:
  
  PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
- predict
  
  public double[] predict() throws PredictiveModel.PredictiveModelException
  
  Returns the predicted values generated by the random forest on the training data.
  
  Specified by:
  
  predict in class PredictiveModel
  
  Returns:
  
  a double array containing the fitted values
  
  Throws:
  
  PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
- predict
  
  public double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
  
  Returns the predicted values on the input test data.
  
  Specified by:
  
  predict in class PredictiveModel
  
  Parameters:
  
  testData - a double matrix containing test data
  Note: testData must have the same number of columns as xy and the columns must be in the same arrangement as in xy.
  
  Returns:
  
  a double array containing the predicted values
  
  Throws:
  
  PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
- predict
  
  public double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
  
  Returns the predicted values on the input test data and the test data weights.
  
  Specified by:
  
  predict in class PredictiveModel
  
  Parameters:
  
  testData - a double matrix containing test data
  
  testDataWeights - a double array containing weight values for each row of testData
  Note: testData must have the same number of columns as xy and the columns must be in the same arrangement as in xy.
  
  Returns:
  
  a double array containing the predicted values
  
  Throws:
  
  PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
- getOutOfBagPredictions
  
  public double[] getOutOfBagPredictions()
  
  Returns the out-of-bag predicted values for the examples in the training data.
  
  Returns:
  
  a double array containing the out-of-bag predictions
- getOutOfBagPredictionError
  
  public double getOutOfBagPredictionError()
  
  Returns the out-of-bag prediction error.
  
  Returns:
  
  a double, the out-of-bag prediction error
- getVariableImportance
  
  public double[] getVariableImportance()
  
  Returns the variable importance measure based on the out-of-bag prediction error.
  Variable importance for a predictor is obtained by randomly permuting the out-of-bag values of the predictor and calculating the difference in predictive accuracy, before and after the permutation. The measure is averaged over all the trees.
  
  Returns:
  
  a double array containing variable importance for each predictor

Class RandomTrees

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel

Constructor Summary

Method Summary

Methods inherited from class com.imsl.datamining.PredictiveModel

Methods inherited from class java.lang.Object

Constructor Details

RandomTrees

RandomTrees

RandomTrees

Method Details

clone

setNumberOfTrees

setNumberOfRandomFeatures

getNumberOfRandomFeatures

setCalculateVariableImportance

isCalculateVariableImportance

getNumberOfTrees

setNumberOfThreads

fitModel

setConfiguration

predict

predict

predict

getOutOfBagPredictions

getOutOfBagPredictionError

getVariableImportance