public abstract class PredictiveModel extends Object implements Serializable, Cloneable
Modifier and Type | Class and Description |
---|---|
static class |
PredictiveModel.CloneNotSupportedException
Wraps the
java.lang.CloneNotSupportedException to indicate
that the clone method in class Object has been
called to clone an object, but that the object's class does not implement
the Cloneable interface. |
static class |
PredictiveModel.PredictiveModelException
An exception class intended to be the parent of all nested Exception
classes where the enclosing class extends
PredictiveModel . |
static class |
PredictiveModel.StateChangeException
Exception thrown when an input parameter has changed that might affect
the model estimates or predictions.
|
static class |
PredictiveModel.SumOfProbabilitiesNotOneException
Exception thrown when the sum of probabilities is not approximately one.
|
static class |
PredictiveModel.VariableType
Enumerates different variable types.
|
Modifier | Constructor and Description |
---|---|
protected |
PredictiveModel(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
PredictiveModel object for a single response
variable and multiple predictor variables. |
protected |
PredictiveModel(PredictiveModel pm)
Constructs a
PredictiveModel from an existing instance. |
Modifier and Type | Method and Description |
---|---|
abstract PredictiveModel |
clone()
Abstract clone method.
|
void |
fitModel()
Fits the predictive model to the training data (estimates the model using
the training data and current configuration settings).
|
double[] |
getClassCounts()
Returns the counts of each class (level) of the categorical response
variable.
|
int[][] |
getClassErrors(double[] knownValues,
double[] predictedValues)
Returns classification error information.
|
String[] |
getClassLabels()
Returns the current class labels for a categorical response variable.
|
double[][] |
getClassProbabilities()
Returns a matrix containing the predicted class probabilities for each
observation in the training data
|
double[][] |
getCostMatrix()
Returns the cost matrix for a categorical response variable.
|
int |
getMaxNumberOfCategories()
Returns the maximum number of categories allowed.
|
int |
getMaxNumberOfIterations()
Returns the maximum number of iterations allowed for the fitting
procedure or training algorithm.
|
int |
getNumberOfClasses()
Returns the number of unique classes found in the categorical response
data.
|
int |
getNumberOfColumns()
Returns the number of columns in the training data
xy . |
int |
getNumberOfMissing()
Returns the number of missing values of the response variable found in
the data
xy . |
int |
getNumberOfPredictors()
Returns the number of predictors.
|
int |
getNumberOfRows()
Returns the number of rows in
xy (observations). |
int[] |
getNumberOfUniquePredictorValues()
Returns an array containing the number of distinct values of each
predictor found in the input data.
|
int[] |
getPredictorIndexes()
Returns the column indices of
xy in which the predictor
variables reside. |
PredictiveModel.VariableType[] |
getPredictorTypes()
Returns an array of
VariableType objects that correspond to
the predictor data types in xy . |
int |
getPrintLevel()
Returns the current print level.
|
double[] |
getPriorProbabilities()
Returns an array containing the prior probabilities.
|
Random |
getRandomObject()
Returns the random object being used in the permutation of the
observations.
|
int |
getResponseColumnIndex()
Returns the column index in
xy containing the response
variable. |
double |
getResponseVariableAverage()
Returns the weighted average value of the response variable.
|
int |
getResponseVariableMostFrequentClass()
Returns the most frequent value of the response variable.
|
PredictiveModel.VariableType |
getResponseVariableType()
Returns the variable type of the response variable.
|
double |
getTotalWeight()
Returns the sum of the active case weights.
|
PredictiveModel.VariableType[] |
getVariableType()
Returns an array containing the variable types in
xy . |
double[] |
getWeights()
Returns an array containing the case weights.
|
double[][] |
getXY()
Returns a copy of the
xy data. |
boolean |
isConstantSeries()
Returns the current value of the
constantSeries flag. |
boolean |
isMustFitModel()
Returns the current value of the
mustFitModel flag. |
boolean |
isUserFixedNClasses()
Returns
true if the number of classes was fixed by the user. |
abstract double[] |
predict()
Predicts the response variable using the most recent fit.
|
abstract double[] |
predict(double[][] testData)
Predicts the response values using the most recent fit and the provided
test data.
|
abstract double[] |
predict(double[][] testData,
double[] testDataWeights)
Predicts the response values using the most recent fit, the provided test
data, and the test data case weights.
|
void |
setClassCounts(double[] classCounts)
Sets the counts of each class of the response variable.
|
void |
setClassLabels(String[] classLabels)
Sets the class names or labels for a categorical response variable.
|
void |
setClassProbabilities(double[][] probs)
Sets the class probabilities.
|
protected abstract void |
setConfiguration(PredictiveModel pm)
Sets the configuration of
PredictiveModel to that of the
input model. |
void |
setCostMatrix(double[][] costMatrix)
Specifies the cost matrix for a categorical response variable.
|
void |
setMaxNumberOfCategories(int maxCategories)
Sets the maximum number of categories allowed within categorical
predictor and response variables.
|
void |
setMaxNumberOfIterations(int maxIterations)
Sets the maximum number of iterations allowed for the fitting procedure
or training algorithm.
|
void |
setMustFitModel(boolean mustFitModel)
Sets the flag of whether or not the model needs to be fit or re-estimated
because of a change in the data or configuration.
|
void |
setNumberOfClasses(int nClasses)
Sets the number of distinct classes of the response variable.
|
void |
setPredictorIndex(int[] predIdx)
Sets the column indices of
xy in which the predictor
variables reside. |
void |
setPredictorTypes(PredictiveModel.VariableType[] predVarType)
Sets the
VariableType objects that correspond to the
predictor data types in xy . |
void |
setPrintLevel(int printLevel)
Sets the print level for a
PredictiveModel . |
void |
setPriorProbabilities(double[] priors)
Sets the prior probabilities for class membership.
|
void |
setRandomObject(Random r)
Sets the random object to be used in the permutation of observation data.
|
void |
setResponseColumnIndex(int index)
Sets the column index in
xy containing the response
variable. |
void |
setTrainingData(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Sets up the training data for the predictive model.
|
void |
setVariableType(PredictiveModel.VariableType[] varType)
Sets the variable types for the data.
|
void |
setWeights(double[] weights)
Specifies the case weights.
|
protected PredictiveModel(PredictiveModel pm)
PredictiveModel
from an existing instance.pm
- an instance of a PredictiveModel
protected PredictiveModel(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
PredictiveModel
object for a single response
variable and multiple predictor variables.
This constructor should be called by all classes extending
PredictiveModel
.
xy
- a double
matrix containing the training data and
associated response valuesresponseColumnIndex
- an int
specifying the column
index in xy
of the response variablevarType
- a PredictiveModel.VariableType
array of length equal to xy[0].length
containing the type of
each variablepublic abstract PredictiveModel clone()
PredictiveModel
must override this method.protected abstract void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException
PredictiveModel
to that of the
input model.
Each instance of a PredictiveModel
must override this
method. The implementation should use specific class methods to set the
parameter settings to that of the input PredictiveModel
instance, essentially creating a copy of the input model. This method is
used for model parameter tuning such as is done in
CrossValidation
, where several variations of the same model
are evaluated and in ensemble methods, such as
BootstrapAggregation, where several identical instances are fit
to random samples.
Each PredictiveModel
subclass must override this method.
pm
- a PredictiveModel
objectPredictiveModel.PredictiveModelException
- is thrown
when exceptions occur in the enclosing class that extends
PredictiveModel
.public abstract double[] predict() throws PredictiveModel.PredictiveModelException
Each PredictiveModel
subclass must override this method.
double
array containing the predicted values.PredictiveModel.PredictiveModelException
- is thrown when an exception occurs in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException
.public abstract double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
Each PredictiveModel
subclass must override this method.
testData
- a double
matrix containing data to be
predicted. testData
must have the same number of columns in
the same arrangement as xy
(the observations).double
array containing the predicted values.PredictiveModel.PredictiveModelException
- is thrown when an exception occurs in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException
.public abstract double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
Each PredictiveModel
subclass must override this method.
testData
- double
matrix containing data to be
predicted. testData
must have the same number of columns in
the same arrangement as xy
(the observations).testDataWeights
- a double
array containing weights for
each row of testData
.double
array containing the predicted values.PredictiveModel.PredictiveModelException
- is thrown when an exception occurs in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException
.public void fitModel() throws PredictiveModel.PredictiveModelException
Subclasses of PredictiveModel
, such as
DecisionTrees
, override this method with specific model
fitting algorithms.
PredictiveModel.PredictiveModelException
- is thrown when an exception occurs in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require throwing exceptions. Exceptions thrown from
these methods will necessarily extend the
PredictiveModelException
.public int[][] getClassErrors(double[] knownValues, double[] predictedValues)
knownValues
- a double
array containing the known
target classificationspredictedValues
- a double
array containing the
predicted classifications
Arrays knownValues
and predictedValues
must be
the same length.
int
matrix of size (nClasses+1)
by 2
containing the number of classification errors and the number of
non-missing classifications for each target classification, plus the
overall totals for these errors.
For \(i \lt \, \)nClasses
, the i-th row
contains the number of classification errors for the i-th class and the
number of patterns with non-missing classifications for that class. The
last row contains the number of classification errors totaled over all
target classifications, and the total number of patterns with non-missing
target classifications.
public double[] getClassCounts()
If the response variable is not PredictiveModel.VariableType.CATEGORICAL
nor PredictiveModel.VariableType.ORDERED_DISCRETE
,
null
is returned.
double
array containing the summation of the case
weights for each occurrence of a particular class found in the
categorical response data.public void setClassCounts(double[] classCounts)
Use this method to set the class counts, when one or more classes do not
occur in the training data due to sampling, but are otherwise valid, or
when the data is distributed and the global counts are available. Only
applies when the response variable is of type PredictiveModel.VariableType.CATEGORICAL
or PredictiveModel.VariableType.ORDERED_DISCRETE
.
classCounts
- a double
array containing the class
counts of the response variable
The default is to use the class counts discovered in the input matrix,
xy
, weighted by the values in weights
.
public void setClassLabels(String[] classLabels)
classLabels
- a string
array containing class names or
labels. The array classLabels
must have length =
nClasses
.
Default: classLabels
= {"1", "2", ...,"K"}, where K =
nClasses
public String[] getClassLabels()
Note: The labels will be null unless they have been set using the method
setClassLabels
.
string
array containing the labels for each class
levelpublic void setMustFitModel(boolean mustFitModel)
mustFitModel
- a boolean
giving the value of the flag
Default: mustFitModel
=true
.
public double[][] getCostMatrix()
The cost matrix has elements C(i, j) = cost of misclassifying a
response in class j as in class i. The diagonal elements of
the cost matrix must be 0. In the case that nClasses
has not
been determined (usually because PredictiveModel.fitModel()
has
not been called), an array of length zero is returned.
double
matrix of dimension
nClasses
by nClasses
containing the cost matrix
for a categorical response variable, where nClasses
is the
number of classes the response variable may assume.public void setCostMatrix(double[][] costMatrix)
costMatrix
- a square double
matrix of dimension
nClasses
by nClasses
containing elements
C(i, j), the cost of misclassifying a response in class j
as in class i. The diagonal elements of the cost matrix must be 0.
Both dimensions of costMatrix
should agree with the number
of classes found in the data. Otherwise an exception will be thrown.
Default: costMatrix[i][j]
=1.0 where \(i\ne j
\) and costMatrix[i][i]
=0.0.
public int getMaxNumberOfIterations()
int
, the maximum number of iterationspublic void setMaxNumberOfIterations(int maxIterations)
Most predictive models use iterative procedures to fit or train the model. Adjusting the maximum number of iterations up or down can assist in diagnosing problems.
maxIterations
- an int
specifying the maximum number of
iterations
Default: maxIterations
=1000
public int getMaxNumberOfCategories()
int
indicating the maximum number of categories
allowed within the predictor and response variables.public void setMaxNumberOfCategories(int maxCategories)
maxCategories
- an int
specifying the maximum number
of categories a predictor or response variable can have.
Default: maxCategories=Math.max(10, maxCat + 1)
, where
maxCat
is the maximum category within all
categorical predictor and response variables.
public int getNumberOfClasses()
int
indicating the number of unique classes found
in the categorical response datapublic void setNumberOfClasses(int nClasses)
PredictiveModel.VariableType.CATEGORICAL
or
PredictiveModel.VariableType.ORDERED_DISCRETE
.nClasses
- an int
representing the number of distinct
classes or categories of the response variable
An error is generated if more than nClasses
categories are
discovered in the data.
Default: nClasses
is 0.
public int getNumberOfColumns()
xy
.int
, the number of columns in xy
. If
xy
is null
, nCols
=0
.public int getNumberOfMissing()
xy
.int
, the number of missing valuespublic int getNumberOfPredictors()
int
, the number of predictorspublic int getNumberOfRows()
xy
(observations).int
, the number of rows in xy
(observations)public int[] getPredictorIndexes()
xy
in which the predictor
variables reside.int
array containing the column indicespublic void setPredictorIndex(int[] predIdx)
xy
in which the predictor
variables reside.
This may be used to subset the full set of predictor variables
(PredictiveModel.getPredictorTypes()
).
predIdx
- an int
array containing the column index for
each predictor variable
Default: All columns other than the column containing the response
variable are indicated.public PredictiveModel.VariableType[] getPredictorTypes()
VariableType
objects that correspond to
the predictor data types in xy
.VariableType
array that corresponds to the
predictor data types in xy
public void setPredictorTypes(PredictiveModel.VariableType[] predVarType)
VariableType
objects that correspond to the
predictor data types in xy
.predVarType
- a VariableType
array of length equal to
the number of predictors specifying the data type of each predictorpublic void setRandomObject(Random r)
r
- a Random
object to be used in the random
permutation of observation data
Specifying a seed for the Random
object can produce
repeatable/deterministic output.
public Random getRandomObject()
Random
object being used for permutationspublic int[] getNumberOfUniquePredictorValues()
For continuous predictor variables, the value is set to 0 and is not meaningful.
int
array containing the number of distinct
values for each predictorpublic int getPrintLevel()
int
, the current print level
printLevel | Action |
0 | No printing. |
1 | Prints final results only. |
2 | Prints intermediate and final results. |
printLevel
= 0.public void setPrintLevel(int printLevel)
PredictiveModel
.printLevel
- an int
specifying the level of printing to
perform
printLevel | Action |
0 | No printing. |
1 | Prints final results only. |
2 | Prints intermediate and final results. |
Default: printLevel
= 0.
public double[][] getClassProbabilities()
double
matrix containing the class probabilitiespublic void setClassProbabilities(double[][] probs) throws PredictiveModel.SumOfProbabilitiesNotOneException
probs
- a double
matrix specifying class probabilities
for each pattern or observation in a data set
The probabilities must range between 0.0 and 1.0 inclusive, and sum to
1.0. The number of columns in probs
should agree with the
number of classes found in the data. Otherwise an exception is thrown.
Calling this method overwrites any existing values.
Default: probs
=null
unless estimated by an
overriding method or set by the user.
PredictiveModel.SumOfProbabilitiesNotOneException
- is thrown when class probabilities
do not sum to 1.0.public double[] getPriorProbabilities()
double
array containing the prior probabilitiespublic void setPriorProbabilities(double[] priors) throws PredictiveModel.SumOfProbabilitiesNotOneException
priors
- a double
array specifying the prior
probabilities
The prior probabilities must range between 0.0 and 1.0 inclusive, and sum
to 1.0. The length of priors
should agree with the number of
classes found in the data. Otherwise an exception is thrown. Calling this
method overwrites any existing values.
Default: Determined from the data.
PredictiveModel.SumOfProbabilitiesNotOneException
- is thrown when prior probabilities
do not sum to 1.0.public int getResponseColumnIndex()
xy
containing the response
variable.int
, the column index for the response variablepublic void setResponseColumnIndex(int index)
xy
containing the response
variable.index
- an int
, the column index for the response
variablepublic double getResponseVariableAverage()
double
, the weighted average value of the response
variablepublic int getResponseVariableMostFrequentClass()
VariableType.CATEGORICAL
or
VariableType.ORDERED_DISCRETE
.int
, the level of the most frequent classpublic PredictiveModel.VariableType getResponseVariableType()
VariableType
of the response variablepublic double getTotalWeight()
double
, the sum of the active case weightspublic PredictiveModel.VariableType[] getVariableType()
xy
.VariableType
array containing the variable types
in xy
public void setVariableType(PredictiveModel.VariableType[] varType)
varType
- a PredictiveModel.VariableType
array of length equal to xy[0].length
containing the type of
each variablepublic double[] getWeights()
double
array containing the case weightspublic void setWeights(double[] weights)
weights
- a double
array specifying case weights
Default: weights[i]
= 1.0 for all i.
public void setTrainingData(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
By calling this method, the problem is either initialized or reset to use the data in the arguments.
xy
- a double
matrix containing the training data and
associated response valuesresponseColumnIndex
- an int
specifying the column
index in xy
of the response variablevarType
- a PredictiveModel.VariableType
array of length equal to xy[0].length
containing the type of
each variablepublic double[][] getXY()
xy
data.double
matrix containing the training datapublic boolean isMustFitModel()
mustFitModel
flag.
When true
, the PredictiveModel.fitModel()
method
should be called before doing any predictions or other analysis.
boolean
indicating the state of the flagpublic boolean isConstantSeries()
constantSeries
flag.
The flag is set to true
if the code determines that
the response variable is constant in the training data. The method
fitModel
will fail if the series
is constant. The flag
will be reset if the training data is changed using
setTrainingData
, and the response variable is not
constant.
boolean
indicating the state of the flagpublic boolean isUserFixedNClasses()
true
if the number of classes was fixed by the user.boolean
indicating the state of the flagCopyright © 2020 Rogue Wave Software. All rights reserved.