public class BootstrapAggregation extends Object implements Serializable, Cloneable
Bootstrap aggregation, also known as bagging, generates predictions using predictive models. In the procedure, M bootstrap samples of size N are drawn with replacement from an original training set of size N. Sampling with replacement means that when an example is randomly selected, it is replaced back into the training set before the next draw. Thus a bootstrap sample can have repeated examples or observations. Using each bootstrap sample as a separate training data set, the procedure fits a predictive model and then generates predictions. For a regression problem (continuous response variable), the M predictions are combined into a single predicted value by averaging. For classification (categorical response variable), majority vote is used.
Originally proposed for decision trees, bagging leads to "improvements for unstable procedures," such as neural networks, classification and regression trees, and subset selection in linear regression. On the other hand, it can mildly degrade the performance of stable methods such as K-nearest neighbors (Breiman, 1996).
| Constructor and Description |
|---|
BootstrapAggregation(PredictiveModel pm)
Constructs a
BootstrapAggregation class in order to generate
predictions of a PredictiveModel using bootstrap aggregation. |
| Modifier and Type | Method and Description |
|---|---|
void |
aggregate()
Performs the bootstrap aggregation.
|
double |
getMeanSquaredPredictionError()
Deprecated.
Renamed to
BootstrapAggregation.getPredictionError(). |
int |
getNumberOfSamples()
Returns the number of bootstrap samples.
|
int |
getNumberOfThreads()
Returns the maximum number of
java.lang.Thread instances
that may be used for parallel processing. |
double |
getOutOfBagPredictionError()
Returns the out-of-bag mean squared prediction error for regression
problems, or the out-of-bag classification percentage error for
classification problems.
|
double[] |
getOutOfBagPredictions()
Returns the out-of-bag predicted values.
|
double |
getPredictionError()
Returns the mean squared prediction error for regression problems, or the
classification percentage error for classification problems.
|
double[] |
getPredictions()
Returns the predicted values.
|
int |
getPrintLevel()
Returns the current print level.
|
double[] |
getVariableImportance()
Returns the variable importance measure based on the out-of-bag
prediction error.
|
boolean |
isCalculateVariableImportance()
Returns the boolean indicating whether or not to calculate variable
importance during bootstrap aggregation.
|
void |
setCalculateVariableImportance(boolean calculate)
Sets the boolean to calculate variable importance.
|
void |
setNumberOfSamples(int nSamples)
Sets the number of bootstrap samples.
|
void |
setNumberOfThreads(int numberOfThreads)
Sets the maximum number of threads for multithreaded runs.
|
void |
setPrintLevel(int printLevel)
Sets the print level for the predictive model.
|
void |
setRandomObject(Random r)
Sets a random object for the bootstrap random sampling scheme.
|
void |
setTestData(double[][] testData)
Sets the test data to be predicted.
|
void |
setTestData(double[][] testData,
double[] testDataWeights)
Sets the test data to be predicted using bootstrap aggregation along with
weights for each row in the test data.
|
void |
setTestData(double[][] testX,
double[][] testY)
Sets the test data to be predicted using bootstrap aggregation.
|
void |
setTestData(double[][] testX,
double[][] testY,
double[] testWts)
Sets the test data to be predicted using bootstrap aggregation along with
weights for each row in the test data.
|
public BootstrapAggregation(PredictiveModel pm)
BootstrapAggregation class in order to generate
predictions of a PredictiveModel using bootstrap aggregation.pm - a PredictiveModel for which the predictions are to
be generatedpublic void aggregate()
throws PredictiveModel.PredictiveModelException,
NoSuchMethodException,
InstantiationException,
IllegalAccessException,
InvocationTargetException
NoSuchMethodException - is thrown when the PredictiveModel subclass
is missing a constructor with the expected signature (see PredictiveModel
(double[][], int, com.imsl.datamining.PredictiveModel.
VariableType[])).InstantiationException - is thrown when an application
tries to create an instance of a class using the newInstance
method in class Class, but the specified class object cannot
be instantiated.IllegalAccessException - is thrown when an application
tries to reflectively create an instance (other than an array), set or
get a field, or invoke a method, but the currently executing method does
not have access to the definition of the specified class, field, method
or constructor.InvocationTargetException - is thrown when a
wrapped exception is thrown by an invoked method or constructor.PredictiveModel.PredictiveModelException - is
thrown when an exception has occurred in the
com.imsl.datamining.PredictiveModel. Superclass exceptions should be
considered such as
com.imsl.datamining.PredictiveModel.StateChangeException and
com.imsl.datamining.PredictiveModel.SumOfProbabilitiesNotOneException.public double getOutOfBagPredictionError()
double, the out-of-bag prediction error
Note: An out-of-bag prediction for a particular example (observation or row) is generated from only those bootstrap training sets which exclude the example. The out-of-bag predictions are done on the training data.
public double getMeanSquaredPredictionError()
BootstrapAggregation.getPredictionError().double, the prediction error
Note: The error is the in-sample fitted error unless the user specifies
the test data using setTestData().
public double getPredictionError()
double, the prediction error
Note: The error is the in-sample fitted error unless the user specifies
the test data using setTestData().
public double[] getVariableImportance()
Variable importance for a predictor is obtained by randomly permuting the out-of-bag values of the predictor and calculating the difference in predictive accuracy, before and after the permutation. The measure is averaged over all the bootstrap samples.
double array containing variable importance for
each predictorpublic int getNumberOfThreads()
java.lang.Thread instances
that may be used for parallel processing.int containing the maximum number of
java.lang.Thread instances that may be used for parallel
processing
The actual number of threads used in parallel processing will be the
lesser of numberOfThreads and nSamples, the
number of bootstrap samples set for bootstrap aggregation. This
assessment is made to optimize use of resources.
public void setNumberOfThreads(int numberOfThreads)
numberOfThreads - an int specifying the maximum number
of java.lang.Thread instances
The actual number of threads used will be the lesser of
numberOfThreads and nSamples, the number of
bootstrap samples set for bootstrap aggregation. This assessment is made
to optimize use of resources.
Default: numberOfThreads = 1.
public int getPrintLevel()
int, the current print level
| printLevel | Action |
| 0 | No printing. |
| 1 | Prints final results only. |
| 2 | Prints intermediate and final results. |
public void setPrintLevel(int printLevel)
printLevel - An int specifying the level of printing to
perform
| printLevel | Action |
| 0 | No printing. |
| 1 | Prints final results only. |
| 2 | Prints intermediate and final results. |
Default: printLevel = 0.
public double[] getPredictions()
double array of predicted values of the response
variable for the examples in the test data
To generate the predicted values, use the method aggregate.
If testData is not specified, in-sample predictions are
produced.
public double[] getOutOfBagPredictions()
double array containing the out-of-bag predicted
values of the response variable for the examples in the training datapublic int getNumberOfSamples()
int, the number of bootstrap samplespublic void setNumberOfSamples(int nSamples)
nSamples - an int specifying the number of bootstrap
samples
Default: nSamples = 50.
public void setRandomObject(Random r)
r - a Random object
Default: r is created inside the code and the seed is set by
the computer clock.
To obtain repeatable results, set the seed of the input r
before calling this method. See Random for other
options.
public void setTestData(double[][] testData,
double[] testDataWeights)
testData - a double matrix containing the test data
testData must have the same number of columns in the same
arrangement as xy. Missing response variable values should
be indicated with Double.NaN().
testDataWeights - a double array containing observation
weights for the test data
Default: If testData is not specified using this method or
other setTestData methods, in-sample predictions are
produced (i.e., the original training set serves as the test data).
public void setTestData(double[][] testX,
double[][] testY)
testX - a double matrix containing the test data
predictors. testX must have the same number of columns in
the same arrangement as the predictors in xy.testY - a double matrix containing the test data
response variable. Missing response variable values should be indicated
with Double.NaN().
Default: If test data is not specified using this method or
other setTestData methods, in-sample predictions are
produced (i.e., the original training set serves as the test data).
public void setTestData(double[][] testX,
double[][] testY,
double[] testWts)
testX - a double matrix containing the test data
predictors. testX must have the same number of columns in
the same arrangement as the predictors in xy.testY - a double matrix containing the test data
response variable. Missing response variable values should be indicated
with Double.NaN().testWts - a double array containing observation weights
for the test data
Default: If test data is not specified using this method or
other setTestData methods, in-sample predictions are
produced (i.e., the original training set serves as the test data).
public void setTestData(double[][] testData)
testData - a double matrix containing test data for
which predictions are to be made using bagging
testData must have the same number of columns in the same
arrangement as xy. Missing response variable values should
be indicated with Double.NaN().
Default: If testData is not specified using this method or
other setTestData methods, in-sample predictions are
produced (i.e., the original training set serves as the test data).
public void setCalculateVariableImportance(boolean calculate)
When true, a permutation type variable importance measure is
calculated during bootstrap aggregation.
calculate - a boolean indicating whether or not to
calculate variable importance
Default: calculate = false
public boolean isCalculateVariableImportance()
boolean, the flag indicating whether or not to
calculate variable importanceCopyright © 2022 Rogue Wave Software. All rights reserved.