Class BootstrapAggregation

java.lang.Object
com.imsl.datamining.BootstrapAggregation
All Implemented Interfaces:
Serializable, Cloneable

public class BootstrapAggregation extends Object implements Serializable, Cloneable
Performs bootstrap aggregation to generate predictions using predictive models.

Bootstrap aggregation, also known as bagging, generates predictions using predictive models. In the procedure, M bootstrap samples of size N are drawn with replacement from an original training set of size N. Sampling with replacement means that when an example is randomly selected, it is replaced back into the training set before the next draw. Thus a bootstrap sample can have repeated examples or observations. Using each bootstrap sample as a separate training data set, the procedure fits a predictive model and then generates predictions. For a regression problem (continuous response variable), the M predictions are combined into a single predicted value by averaging. For classification (categorical response variable), majority vote is used.

Originally proposed for decision trees, bagging leads to "improvements for unstable procedures," such as neural networks, classification and regression trees, and subset selection in linear regression. On the other hand, it can mildly degrade the performance of stable methods such as K-nearest neighbors (Breiman, 1996).

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a BootstrapAggregation class in order to generate predictions of a PredictiveModel using bootstrap aggregation.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Performs the bootstrap aggregation.
    double
    Deprecated.
    int
    Returns the number of bootstrap samples.
    int
    Returns the maximum number of java.lang.Thread instances that may be used for parallel processing.
    double
    Returns the out-of-bag mean squared prediction error for regression problems, or the out-of-bag classification percentage error for classification problems.
    double[]
    Returns the out-of-bag predicted values.
    double
    Returns the mean squared prediction error for regression problems, or the classification percentage error for classification problems.
    double[]
    Returns the predicted values.
    int
    Returns the current print level.
    double[]
    Returns the variable importance measure based on the out-of-bag prediction error.
    boolean
    Returns the boolean indicating whether or not to calculate variable importance during bootstrap aggregation.
    void
    setCalculateVariableImportance(boolean calculate)
    Sets the boolean to calculate variable importance.
    void
    setNumberOfSamples(int nSamples)
    Sets the number of bootstrap samples.
    void
    setNumberOfThreads(int numberOfThreads)
    Sets the maximum number of threads for multithreaded runs.
    void
    setPrintLevel(int printLevel)
    Sets the print level for the predictive model.
    void
    Sets a random object for the bootstrap random sampling scheme.
    void
    setTestData(double[][] testData)
    Sets the test data to be predicted.
    void
    setTestData(double[][] testData, double[] testDataWeights)
    Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.
    void
    setTestData(double[][] testX, double[][] testY)
    Sets the test data to be predicted using bootstrap aggregation.
    void
    setTestData(double[][] testX, double[][] testY, double[] testWts)
    Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • BootstrapAggregation

      public BootstrapAggregation(PredictiveModel pm)
      Constructs a BootstrapAggregation class in order to generate predictions of a PredictiveModel using bootstrap aggregation.
      Parameters:
      pm - a PredictiveModel for which the predictions are to be generated
  • Method Details

    • aggregate

      Performs the bootstrap aggregation.
      Throws:
      NoSuchMethodException - is thrown when the PredictiveModel subclass is missing a constructor with the expected signature (see PredictiveModel (double[][], int, com.imsl.datamining.PredictiveModel. VariableType[])).
      InstantiationException - is thrown when an application tries to create an instance of a class using the newInstance method in class Class, but the specified class object cannot be instantiated.
      IllegalAccessException - is thrown when an application tries to reflectively create an instance (other than an array), set or get a field, or invoke a method, but the currently executing method does not have access to the definition of the specified class, field, method or constructor.
      InvocationTargetException - is thrown when a wrapped exception is thrown by an invoked method or constructor.
      PredictiveModel.PredictiveModelException - is thrown when an exception has occurred in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
    • getOutOfBagPredictionError

      public double getOutOfBagPredictionError()
      Returns the out-of-bag mean squared prediction error for regression problems, or the out-of-bag classification percentage error for classification problems.
      Returns:
      a double, the out-of-bag prediction error

      Note: An out-of-bag prediction for a particular example (observation or row) is generated from only those bootstrap training sets which exclude the example. The out-of-bag predictions are done on the training data.

    • getMeanSquaredPredictionError

      public double getMeanSquaredPredictionError()
      Deprecated.
      Returns the mean squared prediction error for regression problems, or the classification percentage error for classification problems.
      Returns:
      a double, the prediction error

      Note: The error is the in-sample fitted error unless the user specifies the test data using setTestData().

    • getPredictionError

      public double getPredictionError()
      Returns the mean squared prediction error for regression problems, or the classification percentage error for classification problems.
      Returns:
      a double, the prediction error

      Note: The error is the in-sample fitted error unless the user specifies the test data using setTestData().

    • getVariableImportance

      public double[] getVariableImportance()
      Returns the variable importance measure based on the out-of-bag prediction error.

      Variable importance for a predictor is obtained by randomly permuting the out-of-bag values of the predictor and calculating the difference in predictive accuracy, before and after the permutation. The measure is averaged over all the bootstrap samples.

      Returns:
      a double array containing variable importance for each predictor
    • getNumberOfThreads

      public int getNumberOfThreads()
      Returns the maximum number of java.lang.Thread instances that may be used for parallel processing.
      Returns:
      an int containing the maximum number of java.lang.Thread instances that may be used for parallel processing

      The actual number of threads used in parallel processing will be the lesser of numberOfThreads and nSamples, the number of bootstrap samples set for bootstrap aggregation. This assessment is made to optimize use of resources.

    • setNumberOfThreads

      public void setNumberOfThreads(int numberOfThreads)
      Sets the maximum number of threads for multithreaded runs.
      Parameters:
      numberOfThreads - an int specifying the maximum number of java.lang.Thread instances

      The actual number of threads used will be the lesser of numberOfThreads and nSamples, the number of bootstrap samples set for bootstrap aggregation. This assessment is made to optimize use of resources.

      Default: numberOfThreads = 1.

    • getPrintLevel

      public int getPrintLevel()
      Returns the current print level.
      Returns:
      an int, the current print level

      printLevel Action
      0 No printing.
      1 Prints final results only.
      2 Prints intermediate and final results.

    • setPrintLevel

      public void setPrintLevel(int printLevel)
      Sets the print level for the predictive model.
      Parameters:
      printLevel - An int specifying the level of printing to perform

      printLevel Action
      0 No printing.
      1 Prints final results only.
      2 Prints intermediate and final results.

      Default: printLevel = 0.

    • getPredictions

      public double[] getPredictions()
      Returns the predicted values.
      Returns:
      a double array of predicted values of the response variable for the examples in the test data

      To generate the predicted values, use the method aggregate. If testData is not specified, in-sample predictions are produced.

    • getOutOfBagPredictions

      public double[] getOutOfBagPredictions()
      Returns the out-of-bag predicted values.
      Returns:
      a double array containing the out-of-bag predicted values of the response variable for the examples in the training data
    • getNumberOfSamples

      public int getNumberOfSamples()
      Returns the number of bootstrap samples.
      Returns:
      an int, the number of bootstrap samples
    • setNumberOfSamples

      public void setNumberOfSamples(int nSamples)
      Sets the number of bootstrap samples.
      Parameters:
      nSamples - an int specifying the number of bootstrap samples

      Default: nSamples = 50.

    • setRandomObject

      public void setRandomObject(Random r)
      Sets a random object for the bootstrap random sampling scheme.
      Parameters:
      r - a Random object

      Default: r is created inside the code and the seed is set by the computer clock.

      To obtain repeatable results, set the seed of the input r before calling this method. See Random for other options.

    • setTestData

      public void setTestData(double[][] testData, double[] testDataWeights)
      Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.
      Parameters:
      testData - a double matrix containing the test data

      testData must have the same number of columns in the same arrangement as xy. Missing response variable values should be indicated with Double.NaN().

      testDataWeights - a double array containing observation weights for the test data

      Default: If testData is not specified using this method or other setTestData methods, in-sample predictions are produced (i.e., the original training set serves as the test data).

    • setTestData

      public void setTestData(double[][] testX, double[][] testY)
      Sets the test data to be predicted using bootstrap aggregation.
      Parameters:
      testX - a double matrix containing the test data predictors. testX must have the same number of columns in the same arrangement as the predictors in xy.
      testY - a double matrix containing the test data response variable. Missing response variable values should be indicated with Double.NaN().

      Default: If test data is not specified using this method or other setTestData methods, in-sample predictions are produced (i.e., the original training set serves as the test data).

    • setTestData

      public void setTestData(double[][] testX, double[][] testY, double[] testWts)
      Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.
      Parameters:
      testX - a double matrix containing the test data predictors. testX must have the same number of columns in the same arrangement as the predictors in xy.
      testY - a double matrix containing the test data response variable. Missing response variable values should be indicated with Double.NaN().
      testWts - a double array containing observation weights for the test data

      Default: If test data is not specified using this method or other setTestData methods, in-sample predictions are produced (i.e., the original training set serves as the test data).

    • setTestData

      public void setTestData(double[][] testData)
      Sets the test data to be predicted.
      Parameters:
      testData - a double matrix containing test data for which predictions are to be made using bagging

      testData must have the same number of columns in the same arrangement as xy. Missing response variable values should be indicated with Double.NaN().

      Default: If testData is not specified using this method or other setTestData methods, in-sample predictions are produced (i.e., the original training set serves as the test data).

    • setCalculateVariableImportance

      public void setCalculateVariableImportance(boolean calculate)
      Sets the boolean to calculate variable importance.

      When true, a permutation type variable importance measure is calculated during bootstrap aggregation.

      Parameters:
      calculate - a boolean indicating whether or not to calculate variable importance

      Default: calculate = false

    • isCalculateVariableImportance

      public boolean isCalculateVariableImportance()
      Returns the boolean indicating whether or not to calculate variable importance during bootstrap aggregation.
      Returns:
      a boolean, the flag indicating whether or not to calculate variable importance