public abstract class PredictiveModel extends Object implements Serializable, Cloneable
| Modifier and Type | Class and Description |
|---|---|
static class |
PredictiveModel.CloneNotSupportedException
Wraps the
java.lang.CloneNotSupportedException to indicate
that the clone method in class Object has been
called to clone an object, but that the object's class does not implement
the Cloneable interface. |
static class |
PredictiveModel.PredictiveModelException
An exception class intended to be the parent of all nested Exception
classes where the enclosing class extends
PredictiveModel. |
static class |
PredictiveModel.StateChangeException
Exception thrown when an input parameter has changed that might affect
the model estimates or predictions.
|
static class |
PredictiveModel.SumOfProbabilitiesNotOneException
Exception thrown when the sum of probabilities is not approximately one.
|
static class |
PredictiveModel.VariableType
Enumerates different variable types.
|
| Modifier | Constructor and Description |
|---|---|
protected |
PredictiveModel(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
PredictiveModel object for a single response
variable and multiple predictor variables. |
protected |
PredictiveModel(PredictiveModel pm)
Constructs a
PredictiveModel from an existing instance. |
| Modifier and Type | Method and Description |
|---|---|
abstract PredictiveModel |
clone()
Abstract clone method.
|
void |
fitModel()
Fits the predictive model to the training data (estimates the model using
the training data and current configuration settings).
|
double[] |
getClassCounts()
Returns the counts of each class (level) of the categorical response
variable.
|
int[][] |
getClassErrors(double[] knownValues,
double[] predictedValues)
Returns classification error information.
|
String[] |
getClassLabels()
Returns the current class labels for a categorical response variable.
|
double[][] |
getClassProbabilities()
Returns a matrix containing the predicted class probabilities for each
observation in the training data
|
double[][] |
getCostMatrix()
Returns the cost matrix for a categorical response variable.
|
int |
getMaxNumberOfCategories()
Returns the maximum number of categories allowed.
|
int |
getMaxNumberOfIterations()
Returns the maximum number of iterations allowed for the fitting
procedure or training algorithm.
|
int |
getNumberOfClasses()
Returns the number of unique classes found in the categorical response
data.
|
int |
getNumberOfColumns()
Returns the number of columns in the training data
xy. |
int |
getNumberOfMissing()
Returns the number of missing values of the response variable found in
the data
xy. |
int |
getNumberOfPredictors()
Returns the number of predictors.
|
int |
getNumberOfRows()
Returns the number of rows in
xy (observations). |
int[] |
getNumberOfUniquePredictorValues()
Returns an array containing the number of distinct values of each
predictor found in the input data.
|
int[] |
getPredictorIndexes()
Returns the column indices of
xy in which the predictor
variables reside. |
PredictiveModel.VariableType[] |
getPredictorTypes()
Returns an array of
VariableType objects that correspond to
the predictor data types in xy. |
int |
getPrintLevel()
Returns the current print level.
|
double[] |
getPriorProbabilities()
Returns an array containing the prior probabilities.
|
Random |
getRandomObject()
Returns the random object being used in the permutation of the
observations.
|
int |
getResponseColumnIndex()
Returns the column index in
xy containing the response
variable. |
double |
getResponseVariableAverage()
Returns the weighted average value of the response variable.
|
int |
getResponseVariableMostFrequentClass()
Returns the most frequent value of the response variable.
|
PredictiveModel.VariableType |
getResponseVariableType()
Returns the variable type of the response variable.
|
double |
getTotalWeight()
Returns the sum of the active case weights.
|
PredictiveModel.VariableType[] |
getVariableType()
Returns an array containing the variable types in
xy. |
double[] |
getWeights()
Returns an array containing the case weights.
|
double[][] |
getXY()
Returns a copy of the
xy data. |
boolean |
isConstantSeries()
Returns the current value of the
constantSeries flag. |
boolean |
isMustFitModel()
Returns the current value of the
mustFitModel flag. |
boolean |
isUserFixedNClasses()
Returns
true if the number of classes was fixed by the user. |
abstract double[] |
predict()
Predicts the response variable using the most recent fit.
|
abstract double[] |
predict(double[][] testData)
Predicts the response values using the most recent fit and the provided
test data.
|
abstract double[] |
predict(double[][] testData,
double[] testDataWeights)
Predicts the response values using the most recent fit, the provided test
data, and the test data case weights.
|
void |
setClassCounts(double[] classCounts)
Sets the counts of each class of the response variable.
|
void |
setClassLabels(String[] classLabels)
Sets the class names or labels for a categorical response variable.
|
void |
setClassProbabilities(double[][] probs)
Sets the class probabilities.
|
protected abstract void |
setConfiguration(PredictiveModel pm)
Sets the configuration of
PredictiveModel to that of the
input model. |
void |
setCostMatrix(double[][] costMatrix)
Specifies the cost matrix for a categorical response variable.
|
void |
setMaxNumberOfCategories(int maxCategories)
Sets the maximum number of categories allowed within categorical
predictor and response variables.
|
void |
setMaxNumberOfIterations(int maxIterations)
Sets the maximum number of iterations allowed for the fitting procedure
or training algorithm.
|
void |
setMustFitModel(boolean mustFitModel)
Sets the flag of whether or not the model needs to be fit or re-estimated
because of a change in the data or configuration.
|
void |
setNumberOfClasses(int nClasses)
Sets the number of distinct classes of the response variable.
|
void |
setPredictorIndex(int[] predIdx)
Sets the column indices of
xy in which the predictor
variables reside. |
void |
setPredictorTypes(PredictiveModel.VariableType[] predVarType)
Sets the
VariableType objects that correspond to the
predictor data types in xy. |
void |
setPrintLevel(int printLevel)
Sets the print level for a
PredictiveModel. |
void |
setPriorProbabilities(double[] priors)
Sets the prior probabilities for class membership.
|
void |
setRandomObject(Random r)
Sets the random object to be used in the permutation of observation data.
|
void |
setResponseColumnIndex(int index)
Sets the column index in
xy containing the response
variable. |
void |
setTrainingData(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Sets up the training data for the predictive model.
|
void |
setVariableType(PredictiveModel.VariableType[] varType)
Sets the variable types for the data.
|
void |
setWeights(double[] weights)
Specifies the case weights.
|
protected PredictiveModel(PredictiveModel pm)
PredictiveModel from an existing instance.pm - an instance of a PredictiveModelprotected PredictiveModel(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
PredictiveModel object for a single response
variable and multiple predictor variables.
This constructor should be called by all classes extending
PredictiveModel.
xy - a double matrix containing the training data and
associated response valuesresponseColumnIndex - an int specifying the column
index in xy of the response variablevarType - a PredictiveModel.VariableType
array of length equal to xy[0].length containing the type of
each variablepublic abstract PredictiveModel clone()
PredictiveModel must override this method.protected abstract void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException
PredictiveModel to that of the
input model.
Each instance of a PredictiveModel must override this
method. The implementation should use specific class methods to set the
parameter settings to that of the input PredictiveModel
instance, essentially creating a copy of the input model. This method is
used for model parameter tuning such as is done in
CrossValidation, where several variations of the same model
are evaluated and in ensemble methods, such as
BootstrapAggregation, where several identical instances are fit
to random samples.
Each PredictiveModel subclass must override this method.
pm - a PredictiveModel objectPredictiveModel.PredictiveModelException - is thrown
when exceptions occur in the enclosing class that extends
PredictiveModel.public abstract double[] predict()
throws PredictiveModel.PredictiveModelException
Each PredictiveModel subclass must override this method.
double array containing the predicted values.PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common
PredictiveModel methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException.public abstract double[] predict(double[][] testData)
throws PredictiveModel.PredictiveModelException
Each PredictiveModel subclass must override this method.
testData - a double matrix containing data to be
predicted. testData must have the same number of columns in
the same arrangement as xy (the observations).double array containing the predicted values.PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common
PredictiveModel methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException.public abstract double[] predict(double[][] testData,
double[] testDataWeights)
throws PredictiveModel.PredictiveModelException
Each PredictiveModel subclass must override this method.
testData - double matrix containing data to be
predicted. testData must have the same number of columns in
the same arrangement as xy (the observations).testDataWeights - a double array containing weights for
each row of testData.double array containing the predicted values.PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common
PredictiveModel methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException.public void fitModel()
throws PredictiveModel.PredictiveModelException
Subclasses of PredictiveModel, such as
DecisionTrees, override this method with specific model
fitting algorithms.
PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common
PredictiveModel methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException.public int[][] getClassErrors(double[] knownValues,
double[] predictedValues)
knownValues - a double array containing the known
target classificationspredictedValues - a double array containing the
predicted classifications
Arrays knownValues and predictedValues must be
the same length.
int matrix of size (nClasses+1) by 2
containing the number of classification errors and the number of
non-missing classifications for each target classification, plus the
overall totals for these errors.
For \(i \lt \, \)nClasses, the i-th row
contains the number of classification errors for the i-th class and the
number of patterns with non-missing classifications for that class. The
last row contains the number of classification errors totaled over all
target classifications, and the total number of patterns with non-missing
target classifications.
public double[] getClassCounts()
If the response variable is not PredictiveModel.VariableType.CATEGORICAL nor PredictiveModel.VariableType.ORDERED_DISCRETE,
null is returned.
double array containing the summation of the case
weights for each occurrence of a particular class found in the
categorical response data.public void setClassCounts(double[] classCounts)
Use this method to set the class counts, when one or more classes do not
occur in the training data due to sampling, but are otherwise valid, or
when the data is distributed and the global counts are available. Only
applies when the response variable is of type PredictiveModel.VariableType.CATEGORICAL or PredictiveModel.VariableType.ORDERED_DISCRETE.
classCounts - a double array containing the class
counts of the response variable
The default is to use the class counts discovered in the input matrix,
xy, weighted by the values in weights.
public void setClassLabels(String[] classLabels)
classLabels - a string array containing class names or
labels. The array classLabels must have length =
nClasses.
Default: classLabels = {"1", "2", ...,"K"}, where K =
nClasses
public String[] getClassLabels()
Note: The labels will be null unless they have been set using the method
setClassLabels.
string array containing the labels for each class
levelpublic void setMustFitModel(boolean mustFitModel)
mustFitModel - a boolean giving the value of the flag
Default: mustFitModel=true.
public double[][] getCostMatrix()
The cost matrix has elements C(i, j) = cost of misclassifying a
response in class j as in class i. The diagonal elements of
the cost matrix must be 0. In the case that nClasses has not
been determined (usually because PredictiveModel.fitModel() has
not been called), an array of length zero is returned.
double matrix of dimension
nClasses by nClasses containing the cost matrix
for a categorical response variable, where nClasses is the
number of classes the response variable may assume.public void setCostMatrix(double[][] costMatrix)
costMatrix - a square double matrix of dimension
nClasses by nClasses containing elements
C(i, j), the cost of misclassifying a response in class j
as in class i. The diagonal elements of the cost matrix must be 0.
Both dimensions of costMatrix should agree with the number
of classes found in the data. Otherwise an exception will be thrown.
Default: costMatrix[i][j]=1.0 where \(i\ne j
\) and costMatrix[i][i]=0.0.
public int getMaxNumberOfIterations()
int, the maximum number of iterationspublic void setMaxNumberOfIterations(int maxIterations)
Most predictive models use iterative procedures to fit or train the model. Adjusting the maximum number of iterations up or down can assist in diagnosing problems.
maxIterations - an int specifying the maximum number of
iterations
Default: maxIterations=1000
public int getMaxNumberOfCategories()
int indicating the maximum number of categories
allowed within the predictor and response variables.public void setMaxNumberOfCategories(int maxCategories)
maxCategories - an int specifying the maximum number
of categories a predictor or response variable can have.
Default: maxCategories=Math.max(10, maxCat + 1), where
maxCat is the maximum category within all
categorical predictor and response variables.
public int getNumberOfClasses()
int indicating the number of unique classes found
in the categorical response datapublic void setNumberOfClasses(int nClasses)
PredictiveModel.VariableType.CATEGORICAL or
PredictiveModel.VariableType.ORDERED_DISCRETE.nClasses - an int representing the number of distinct
classes or categories of the response variable
An error is generated if more than nClasses categories are
discovered in the data.
Default: nClasses is 0.
public int getNumberOfColumns()
xy.int, the number of columns in xy. If
xy is null, nCols=0.public int getNumberOfMissing()
xy.int, the number of missing valuespublic int getNumberOfPredictors()
int, the number of predictorspublic int getNumberOfRows()
xy (observations).int, the number of rows in xy
(observations)public int[] getPredictorIndexes()
xy in which the predictor
variables reside.int array containing the column indicespublic void setPredictorIndex(int[] predIdx)
xy in which the predictor
variables reside.
This may be used to subset the full set of predictor variables
(PredictiveModel.getPredictorTypes()).
predIdx - an int array containing the column index for
each predictor variable
Default: All columns other than the column containing the response
variable are indicated.public PredictiveModel.VariableType[] getPredictorTypes()
VariableType objects that correspond to
the predictor data types in xy.VariableType array that corresponds to the
predictor data types in xypublic void setPredictorTypes(PredictiveModel.VariableType[] predVarType)
VariableType objects that correspond to the
predictor data types in xy.predVarType - a VariableType array of length equal to
the number of predictors specifying the data type of each predictorpublic void setRandomObject(Random r)
r - a Random object to be used in the random
permutation of observation data
Specifying a seed for the Random object can produce
repeatable/deterministic output.
public Random getRandomObject()
Random object being used for permutationspublic int[] getNumberOfUniquePredictorValues()
For continuous predictor variables, the value is set to 0 and is not meaningful.
int array containing the number of distinct
values for each predictorpublic int getPrintLevel()
int, the current print level
| printLevel | Action |
| 0 | No printing. |
| 1 | Prints final results only. |
| 2 | Prints intermediate and final results. |
printLevel = 0.public void setPrintLevel(int printLevel)
PredictiveModel.printLevel - an int specifying the level of printing to
perform
| printLevel | Action |
| 0 | No printing. |
| 1 | Prints final results only. |
| 2 | Prints intermediate and final results. |
Default: printLevel = 0.
public double[][] getClassProbabilities()
double matrix containing the class probabilitiespublic void setClassProbabilities(double[][] probs)
throws PredictiveModel.SumOfProbabilitiesNotOneException
probs - a double matrix specifying class probabilities
for each pattern or observation in a data set
The probabilities must range between 0.0 and 1.0 inclusive, and sum to
1.0. The number of columns in probs should agree with the
number of classes found in the data. Otherwise an exception is thrown.
Calling this method overwrites any existing values.
Default: probs=null unless estimated by an
overriding method or set by the user.
PredictiveModel.SumOfProbabilitiesNotOneException - is thrown when class probabilities
do not sum to 1.0.public double[] getPriorProbabilities()
double array containing the prior probabilitiespublic void setPriorProbabilities(double[] priors)
throws PredictiveModel.SumOfProbabilitiesNotOneException
priors - a double array specifying the prior
probabilities
The prior probabilities must range between 0.0 and 1.0 inclusive, and sum
to 1.0. The length of priors should agree with the number of
classes found in the data. Otherwise an exception is thrown. Calling this
method overwrites any existing values.
Default: Determined from the data.
PredictiveModel.SumOfProbabilitiesNotOneException - is thrown when prior probabilities
do not sum to 1.0.public int getResponseColumnIndex()
xy containing the response
variable.int, the column index for the response variablepublic void setResponseColumnIndex(int index)
xy containing the response
variable.index - an int, the column index for the response
variablepublic double getResponseVariableAverage()
double, the weighted average value of the response
variablepublic int getResponseVariableMostFrequentClass()
VariableType.CATEGORICAL or
VariableType.ORDERED_DISCRETE.int, the level of the most frequent classpublic PredictiveModel.VariableType getResponseVariableType()
VariableType of the response variablepublic double getTotalWeight()
double, the sum of the active case weightspublic PredictiveModel.VariableType[] getVariableType()
xy.VariableType array containing the variable types
in xypublic void setVariableType(PredictiveModel.VariableType[] varType)
varType - a PredictiveModel.VariableType
array of length equal to xy[0].length containing the type of
each variablepublic double[] getWeights()
double array containing the case weightspublic void setWeights(double[] weights)
weights - a double array specifying case weights
Default: weights[i] = 1.0 for all i.
public void setTrainingData(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
By calling this method, the problem is either initialized or reset to use the data in the arguments.
xy - a double matrix containing the training data and
associated response valuesresponseColumnIndex - an int specifying the column
index in xy of the response variablevarType - a PredictiveModel.VariableType
array of length equal to xy[0].length containing the type of
each variablepublic double[][] getXY()
xy data.double matrix containing the training datapublic boolean isMustFitModel()
mustFitModel flag.
When true, the PredictiveModel.fitModel() method
should be called before doing any predictions or other analysis.
boolean indicating the state of the flagpublic boolean isConstantSeries()
constantSeries flag.
The flag is set to true if the code determines that
the response variable is constant in the training data. The method
fitModel will fail if the series
is constant. The flag
will be reset if the training data is changed using
setTrainingData, and the response variable is not
constant.
boolean indicating the state of the flagpublic boolean isUserFixedNClasses()
true if the number of classes was fixed by the user.boolean indicating the state of the flagCopyright © 2020 Rogue Wave Software. All rights reserved.