Class RandomTrees

java.lang.Object
com.imsl.datamining.PredictiveModel
com.imsl.datamining.decisionTree.RandomTrees
All Implemented Interfaces:
Serializable, Cloneable

public class RandomTrees extends PredictiveModel implements Serializable, Cloneable
Generates predictions using a random forest of decision trees.

A random forest is an ensemble of decision trees. Like bootstrap aggregation, a tree is fit to each of M bootstrap samples from the training data. Each tree is then used to generate predictions. For a regression problem (continuous response variable), the M predictions are combined into a single predicted value by averaging. For classification (categorical response variable), majority vote is used. A random forest also randomizes the predictors. That is, in every tree, the splitting variable at every node is selected from a random subset of the predictors. Randomization of the predictors reduces correlation among individual trees. The random forest was invented by Leo Breiman in 2001 (Breiman, 2001). Random ForestsTM is the trademark term for this approach. Also see Hastie, Tibshirani, and Friedman, 2008, for further discussion.

See Also:
  • Constructor Details

    • RandomTrees

      public RandomTrees(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
      Constructs a RandomTrees random forest of ALACART decision trees.
      Parameters:
      xy - a double matrix containing the training data
      responseColumnIndex - an int, the column index for the response variable
      varType - a PredictiveModel.VariableType array containing the type of each variable
    • RandomTrees

      public RandomTrees(DecisionTree dt)
      Constructs a RandomTrees random forest of the input decision tree.
      Parameters:
      dt - a DecisionTree object
    • RandomTrees

      public RandomTrees(RandomTrees rtModel)
      Constructs a copy of the input RandomTrees predictive model.
      Parameters:
      rtModel - a RandomTrees predictive model
  • Method Details

    • clone

      public RandomTrees clone()
      Clones a RandomTrees predictive model.
      Specified by:
      clone in class PredictiveModel
      Returns:
      a clone of the RandomTrees predictive model
    • setNumberOfTrees

      public void setNumberOfTrees(int numberOfTrees)
      Sets the number of trees to generate in the random forest.

      The number of trees is equivalent to the number of bootstrap samples.

      Parameters:
      numberOfTrees - an int, the number of trees to generate

      Default: numberOfTrees=50

    • setNumberOfRandomFeatures

      public void setNumberOfRandomFeatures(int numberOfRandomFeatures)
      Sets the number of random features used in the splitting rules.
      Parameters:
      numberOfRandomFeatures - an int, the number of predictors in the random subset

      Default: numberOfRandomFeatures=\(\sqrt{p}\) for classification problems, \(\frac{p}{3}\) for regression problems, where \(p\) is the number of predictors in the training data.

    • getNumberOfRandomFeatures

      public int getNumberOfRandomFeatures()
      Returns the number of random features used in the splitting rules.
      Returns:
      an int, the number of random features
    • setCalculateVariableImportance

      public void setCalculateVariableImportance(boolean calculate)
      Sets the boolean to calculate variable importance.

      When true, a permutation type variable importance measure is calculated during bootstrap aggregation.

      Parameters:
      calculate - a boolean indicating whether or not to calculate variable importance

      Default: calculate = false

    • isCalculateVariableImportance

      public boolean isCalculateVariableImportance()
      Returns the current setting of the boolean to calculate variable importance.
      Returns:
      a boolean, the current setting of the flag
    • getNumberOfTrees

      public int getNumberOfTrees()
      Returns the number of trees.
      Returns:
      an int, the number of trees
    • setNumberOfThreads

      public void setNumberOfThreads(int numberOfThreads)
      Sets the maximum number of java.lang.Thread instances that may be used for parallel processing.
      Parameters:
      numberOfThreads - an int specifying the maximum number of java.lang.Thread instances that may be used for parallel processing.

      The actual number of threads used in parallel processing will be the lesser of numberOfThreads and numberOfTrees, the number of trees in the random forest. This assessment is made to optimize use of resources.

      Default: numberOfThreads = 1.

    • fitModel

      public void fitModel() throws PredictiveModel.PredictiveModelException
      Fits the random forest to the training data.
      Overrides:
      fitModel in class PredictiveModel
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
    • setConfiguration

      protected void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException
      Sets the configuration of RandomTrees to that of the input model.
      Specified by:
      setConfiguration in class PredictiveModel
      Parameters:
      pm - a RandomTrees object
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
    • predict

      public double[] predict() throws PredictiveModel.PredictiveModelException
      Returns the predicted values generated by the random forest on the training data.
      Specified by:
      predict in class PredictiveModel
      Returns:
      a double array containing the fitted values
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
    • predict

      public double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
      Returns the predicted values on the input test data.
      Specified by:
      predict in class PredictiveModel
      Parameters:
      testData - a double matrix containing test data

      Note: testData must have the same number of columns as xy and the columns must be in the same arrangement as in xy.

      Returns:
      a double array containing the predicted values
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
    • predict

      public double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
      Returns the predicted values on the input test data and the test data weights.
      Specified by:
      predict in class PredictiveModel
      Parameters:
      testData - a double matrix containing test data
      testDataWeights - a double array containing weight values for each row of testData

      Note: testData must have the same number of columns as xy and the columns must be in the same arrangement as in xy.

      Returns:
      a double array containing the predicted values
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
    • getOutOfBagPredictions

      public double[] getOutOfBagPredictions()
      Returns the out-of-bag predicted values for the examples in the training data.
      Returns:
      a double array containing the out-of-bag predictions
    • getOutOfBagPredictionError

      public double getOutOfBagPredictionError()
      Returns the out-of-bag prediction error.
      Returns:
      a double, the out-of-bag prediction error
    • getVariableImportance

      public double[] getVariableImportance()
      Returns the variable importance measure based on the out-of-bag prediction error.

      Variable importance for a predictor is obtained by randomly permuting the out-of-bag values of the predictor and calculating the difference in predictive accuracy, before and after the permutation. The measure is averaged over all the trees.

      Returns:
      a double array containing variable importance for each predictor