Class BootstrapAggregation
- All Implemented Interfaces:
Serializable,Cloneable
Bootstrap aggregation, also known as bagging, generates predictions using predictive models. In the procedure, M bootstrap samples of size N are drawn with replacement from an original training set of size N. Sampling with replacement means that when an example is randomly selected, it is replaced back into the training set before the next draw. Thus a bootstrap sample can have repeated examples or observations. Using each bootstrap sample as a separate training data set, the procedure fits a predictive model and then generates predictions. For a regression problem (continuous response variable), the M predictions are combined into a single predicted value by averaging. For classification (categorical response variable), majority vote is used.
Originally proposed for decision trees, bagging leads to "improvements for unstable procedures," such as neural networks, classification and regression trees, and subset selection in linear regression. On the other hand, it can mildly degrade the performance of stable methods such as K-nearest neighbors (Breiman, 1996).
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionConstructs aBootstrapAggregationclass in order to generate predictions of aPredictiveModelusing bootstrap aggregation. -
Method Summary
Modifier and TypeMethodDescriptionvoidPerforms the bootstrap aggregation.doubleDeprecated.intReturns the number of bootstrap samples.intReturns the maximum number ofjava.lang.Threadinstances that may be used for parallel processing.doubleReturns the out-of-bag mean squared prediction error for regression problems, or the out-of-bag classification percentage error for classification problems.double[]Returns the out-of-bag predicted values.doubleReturns the mean squared prediction error for regression problems, or the classification percentage error for classification problems.double[]Returns the predicted values.intReturns the current print level.double[]Returns the variable importance measure based on the out-of-bag prediction error.booleanReturns the boolean indicating whether or not to calculate variable importance during bootstrap aggregation.voidsetCalculateVariableImportance(boolean calculate) Sets the boolean to calculate variable importance.voidsetNumberOfSamples(int nSamples) Sets the number of bootstrap samples.voidsetNumberOfThreads(int numberOfThreads) Sets the maximum number of threads for multithreaded runs.voidsetPrintLevel(int printLevel) Sets the print level for the predictive model.voidSets a random object for the bootstrap random sampling scheme.voidsetTestData(double[][] testData) Sets the test data to be predicted.voidsetTestData(double[][] testData, double[] testDataWeights) Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.voidsetTestData(double[][] testX, double[][] testY) Sets the test data to be predicted using bootstrap aggregation.voidsetTestData(double[][] testX, double[][] testY, double[] testWts) Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.
-
Constructor Details
-
BootstrapAggregation
Constructs aBootstrapAggregationclass in order to generate predictions of aPredictiveModelusing bootstrap aggregation.- Parameters:
pm- aPredictiveModelfor which the predictions are to be generated
-
-
Method Details
-
aggregate
public void aggregate() throws PredictiveModel.PredictiveModelException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetExceptionPerforms the bootstrap aggregation.- Throws:
NoSuchMethodException- is thrown when the PredictiveModel subclass is missing a constructor with the expected signature (seePredictiveModel (double[][], int, com.imsl.datamining.PredictiveModel. VariableType[])).InstantiationException- is thrown when an application tries to create an instance of a class using thenewInstancemethod in classClass, but the specified class object cannot be instantiated.IllegalAccessException- is thrown when an application tries to reflectively create an instance (other than an array), set or get a field, or invoke a method, but the currently executing method does not have access to the definition of the specified class, field, method or constructor.InvocationTargetException- is thrown when a wrapped exception is thrown by an invoked method or constructor.PredictiveModel.PredictiveModelException- is thrown when an exception has occurred in the com.imsl.datamining.PredictiveModel. Superclass exceptions should be considered such as com.imsl.datamining.PredictiveModel.StateChangeException and com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.
-
getOutOfBagPredictionError
public double getOutOfBagPredictionError()Returns the out-of-bag mean squared prediction error for regression problems, or the out-of-bag classification percentage error for classification problems.- Returns:
- a
double, the out-of-bag prediction errorNote: An out-of-bag prediction for a particular example (observation or row) is generated from only those bootstrap training sets which exclude the example. The out-of-bag predictions are done on the training data.
-
getMeanSquaredPredictionError
public double getMeanSquaredPredictionError()Deprecated.Renamed togetPredictionError().Returns the mean squared prediction error for regression problems, or the classification percentage error for classification problems.- Returns:
- a
double, the prediction errorNote: The error is the in-sample fitted error unless the user specifies the test data using
setTestData().
-
getPredictionError
public double getPredictionError()Returns the mean squared prediction error for regression problems, or the classification percentage error for classification problems.- Returns:
- a
double, the prediction errorNote: The error is the in-sample fitted error unless the user specifies the test data using
setTestData().
-
getVariableImportance
public double[] getVariableImportance()Returns the variable importance measure based on the out-of-bag prediction error.Variable importance for a predictor is obtained by randomly permuting the out-of-bag values of the predictor and calculating the difference in predictive accuracy, before and after the permutation. The measure is averaged over all the bootstrap samples.
- Returns:
- a
doublearray containing variable importance for each predictor
-
getNumberOfThreads
public int getNumberOfThreads()Returns the maximum number ofjava.lang.Threadinstances that may be used for parallel processing.- Returns:
- an
intcontaining the maximum number ofjava.lang.Threadinstances that may be used for parallel processingThe actual number of threads used in parallel processing will be the lesser of
numberOfThreadsandnSamples, the number of bootstrap samples set for bootstrap aggregation. This assessment is made to optimize use of resources.
-
setNumberOfThreads
public void setNumberOfThreads(int numberOfThreads) Sets the maximum number of threads for multithreaded runs.- Parameters:
numberOfThreads- anintspecifying the maximum number ofjava.lang.ThreadinstancesThe actual number of threads used will be the lesser of
numberOfThreadsandnSamples, the number of bootstrap samples set for bootstrap aggregation. This assessment is made to optimize use of resources.Default:
numberOfThreads= 1.
-
getPrintLevel
public int getPrintLevel()Returns the current print level.- Returns:
- an
int, the current print levelprintLevel Action 0 No printing. 1 Prints final results only. 2 Prints intermediate and final results.
-
setPrintLevel
public void setPrintLevel(int printLevel) Sets the print level for the predictive model.- Parameters:
printLevel- Anintspecifying the level of printing to performprintLevel Action 0 No printing. 1 Prints final results only. 2 Prints intermediate and final results. Default:
printLevel= 0.
-
getPredictions
public double[] getPredictions()Returns the predicted values.- Returns:
- a
doublearray of predicted values of the response variable for the examples in the test dataTo generate the predicted values, use the method
aggregate. IftestDatais not specified, in-sample predictions are produced.
-
getOutOfBagPredictions
public double[] getOutOfBagPredictions()Returns the out-of-bag predicted values.- Returns:
- a
doublearray containing the out-of-bag predicted values of the response variable for the examples in the training data
-
getNumberOfSamples
public int getNumberOfSamples()Returns the number of bootstrap samples.- Returns:
- an
int, the number of bootstrap samples
-
setNumberOfSamples
public void setNumberOfSamples(int nSamples) Sets the number of bootstrap samples.- Parameters:
nSamples- anintspecifying the number of bootstrap samplesDefault: nSamples = 50.
-
setRandomObject
Sets a random object for the bootstrap random sampling scheme.- Parameters:
r- aRandomobjectDefault:
ris created inside the code and the seed is set by the computer clock.To obtain repeatable results, set the seed of the input
rbefore calling this method. SeeRandomfor other options.
-
setTestData
public void setTestData(double[][] testData, double[] testDataWeights) Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.- Parameters:
testData- adoublematrix containing the test datatestDatamust have the same number of columns in the same arrangement asxy. Missing response variable values should be indicated withDouble.NaN().testDataWeights- adoublearray containing observation weights for the test dataDefault: If
testDatais not specified using this method or othersetTestDatamethods, in-sample predictions are produced (i.e., the original training set serves as the test data).
-
setTestData
public void setTestData(double[][] testX, double[][] testY) Sets the test data to be predicted using bootstrap aggregation.- Parameters:
testX- adoublematrix containing the test data predictors.testXmust have the same number of columns in the same arrangement as the predictors inxy.testY- adoublematrix containing the test data response variable. Missing response variable values should be indicated withDouble.NaN().Default: If test data is not specified using this method or other
setTestDatamethods, in-sample predictions are produced (i.e., the original training set serves as the test data).
-
setTestData
public void setTestData(double[][] testX, double[][] testY, double[] testWts) Sets the test data to be predicted using bootstrap aggregation along with weights for each row in the test data.- Parameters:
testX- adoublematrix containing the test data predictors.testXmust have the same number of columns in the same arrangement as the predictors inxy.testY- adoublematrix containing the test data response variable. Missing response variable values should be indicated withDouble.NaN().testWts- adoublearray containing observation weights for the test dataDefault: If test data is not specified using this method or other
setTestDatamethods, in-sample predictions are produced (i.e., the original training set serves as the test data).
-
setTestData
public void setTestData(double[][] testData) Sets the test data to be predicted.- Parameters:
testData- adoublematrix containing test data for which predictions are to be made using baggingtestDatamust have the same number of columns in the same arrangement asxy. Missing response variable values should be indicated withDouble.NaN().Default: If
testDatais not specified using this method or othersetTestDatamethods, in-sample predictions are produced (i.e., the original training set serves as the test data).
-
setCalculateVariableImportance
public void setCalculateVariableImportance(boolean calculate) Sets the boolean to calculate variable importance.When
true, a permutation type variable importance measure is calculated during bootstrap aggregation.- Parameters:
calculate- abooleanindicating whether or not to calculate variable importanceDefault:
calculate= false
-
isCalculateVariableImportance
public boolean isCalculateVariableImportance()Returns the boolean indicating whether or not to calculate variable importance during bootstrap aggregation.- Returns:
- a
boolean, the flag indicating whether or not to calculate variable importance
-
getPredictionError().