public abstract class PredictiveModel extends Object implements Serializable, Cloneable
Modifier and Type | Class and Description |
---|---|
static class |
PredictiveModel.PredictiveModelException
An exception class intended to be the parent of all nested Exception
classes where the enclosing class extends
PredictiveModel . |
static class |
PredictiveModel.StateChangeException
Exception thrown when an input parameter has changed that might affect
the model estimates or predictions.
|
static class |
PredictiveModel.SumOfProbabilitiesNotOneException
Exception thrown when the sum of probabilities is not approximately one.
|
static class |
PredictiveModel.VariableType
An enumeration of data types/characteristics.
|
Modifier | Constructor and Description |
---|---|
protected |
PredictiveModel(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a
PredictiveModel object for a single response
variable and multiple predictor variables. |
protected |
PredictiveModel(PredictiveModel pm)
Constructs a
PredictiveModel from an existing instance. |
Modifier and Type | Method and Description |
---|---|
void |
fitModel()
Fits the predictive model to the training data (estimates the model using
the training data and current configuration settings).
|
double[] |
getClassCounts()
Returns the counts of each class (level) of the categorical response
variable.
|
double[][] |
getCostMatrix()
Returns the cost matrix for a categorical response variable.
|
int |
getMaxNumberOfCategories()
Returns the maximum number of categorical variables allowed.
|
int |
getNumberOfClasses()
Returns the number of unique classes found in the categorical response
data.
|
int |
getNumberOfColumns()
Returns the number of columns in
xy . |
int |
getNumberOfMissing()
Returns the number of missing values of the response variable found in
the data
xy . |
int |
getNumberOfPredictors()
Returns the number of predictors.
|
int |
getNumberOfRows()
Returns the number of rows in
xy (observations). |
int[] |
getNumberOfUniquePredictorValues()
Returns an array containing the number of distinct values of each
categorical or ordinal predictor found in the input data.
|
int[] |
getPredictorIndexes()
Returns an array of indices into
xy where the predictor
variables reside. |
PredictiveModel.VariableType[] |
getPredictorTypes()
Returns an array of
VariableType objects that correspond to
the predictor data types in xy . |
int |
getPrintLevel()
Returns the current print level.
|
double[] |
getPriorProbabilities()
Returns an array containing the prior probabilities.
|
Random |
getRandomObject()
Returns the random object being used in the permutation of the
observations.
|
int |
getResponseColumnIndex()
Returns the column index in
xy containing the response
variable. |
double |
getResponseVariableAverage()
Returns the weighted average value of the response variable.
|
int |
getResponseVariableMostFrequentClass()
Returns the most frequent value of the response variable.
|
PredictiveModel.VariableType |
getResponseVariableType()
Returns the variable type of the response variable.
|
double |
getTotalWeight()
Returns the sum of the active case weights.
|
PredictiveModel.VariableType[] |
getVariableType()
Returns an array containing the variable types in
xy . |
double[] |
getWeights()
Returns an array containing the case weights.
|
double[][] |
getXY()
Returns a copy of the
xy data. |
boolean |
isMustFitModelFlag()
Returns the current value of the
mustFitModel flag. |
boolean |
isUserFixedNClasses()
Returns
true if the number of classes was fixed by the user. |
abstract double[] |
predict()
Predicts the response variable using the most recent fit.
|
abstract double[] |
predict(double[][] testData)
Predicts the response values using the most recent fit and the provided
test data.
|
abstract double[] |
predict(double[][] testData,
double[] testDataWeights)
Predicts the response values using the most recent fit, the provided test
data, and the test data case weights.
|
void |
setClassCounts(double[] classCounts)
Sets the counts of each class of the response variable.
|
protected abstract void |
setConfiguration(PredictiveModel pm)
Sets the configuration of
PredictiveModel to that of the
input model. |
void |
setCostMatrix(double[][] costMatrix)
Specifies the cost matrix for a categorical response variable.
|
void |
setFitModelFlag(boolean fitModelFlag)
Sets the flag of whether or not the model needs to be fit or re-estimated
because of a change in the data or configuration.
|
void |
setMaxNumberOfCategories(int maxCategories)
Sets the maximum number of categories allowed within categorical
predictor variables.
|
void |
setNumberOfClasses(int nClasses)
Sets the number of distinct classes of the response variable.
|
void |
setPredictorIndex(int[] predIdx)
Sets the array of indices into
xy where the predictor
variables reside. |
void |
setPredictorTypes(PredictiveModel.VariableType[] predVarType)
Sets the
VariableType objects that correspond to the
predictor data types in xy . |
void |
setPrintLevel(int printLevel)
Sets a print level that determines the information printed for a
PredictiveModel . |
void |
setPriorProbabilities(double[] priors)
Set the prior probabilities for class membership.
|
void |
setRandomObject(Random r)
Sets the random object to be used in the permutation of observation data.
|
void |
setWeights(double[] weights)
Specifies the case weights.
|
protected PredictiveModel(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
PredictiveModel
object for a single response
variable and multiple predictor variables.
This constructor should be called by all classes extending
PredictiveModel
.
xy
- a double
matrix that is a number of observations
by the number of variables.responseColumnIndex
- an int
specifying the column
index of the response variable.varType
- a PredictiveModel.VariableType
array of length equal to xy[0].length
containing the type of
each variable.protected PredictiveModel(PredictiveModel pm)
PredictiveModel
from an existing instance.pm
- an instance of a PredictiveModel
.public void fitModel() throws PredictiveModel.PredictiveModelException
Specific model fitting algorithms are done by overriding this method in
the PredictiveModel
subclasses.
PredictiveModel.PredictiveModelException
- an exception has occurred in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require that exceptions be thrown. Exceptions thrown
from these methods will necessarily extend the
PredictiveModelException
.public double[] getClassCounts()
If the response variable is not PredictiveModel.VariableType.CATEGORICAL
nor PredictiveModel.VariableType.ORDERED_DISCRETE
,
null
is returned.
double
array containing the summation of the case
weights for each occurrence of a particular class found in the
categorical response data.public double[][] getCostMatrix()
The cost matrix has elements C(i, j) = cost of misclassifying a
response in class j as in class i. The diagonal elements of
the cost matrix must be 0. In the case that nClasses
has not
been determined (usually because fitModel()
has
not been called), an array of length zero is returned.
double
matrix of dimension
nClasses
by nClasses
containing the cost matrix
for a categorical response variable, where nClasses
is the
number of classes the response variable may assume.public int getMaxNumberOfCategories()
int
indicating the maximum number of categorical
variables allowed.public int getNumberOfClasses()
int
indicating the number of unique classes found
in the categorical response data.public int getNumberOfColumns()
xy
.int
that indicates the number of columns.public int getNumberOfMissing()
xy
.int
indicating the number of missing values.public int getNumberOfPredictors()
int
equal to the number of predictors.public int getNumberOfRows()
xy
(observations).int
equal to the number of rows in
xy
(observations).public int[] getNumberOfUniquePredictorValues()
For predictors with PredictiveModel.VariableType.QUANTITATIVE_CONTINUOUS
, the
value is set to 0 but is not meaningful.
int
array containing the number of distinct
values.public int[] getPredictorIndexes()
xy
where the predictor
variables reside.int
array containing indices into xy
where the predictor variables reside.public PredictiveModel.VariableType[] getPredictorTypes()
VariableType
objects that correspond to
the predictor data types in xy
.VariableType
array that corresponds to the
predictor data types in xy
.public int getPrintLevel()
int
indicating the current print level.
printLevel | Action |
0 | No printing. |
1 | Prints final results only. |
2 | Prints intermediate and final results. |
printLevel
= 0.public double[] getPriorProbabilities()
double
array containing the prior probabilities.public Random getRandomObject()
Random
objectpublic int getResponseColumnIndex()
xy
containing the response
variable.int
specifying the column index for the response
variable.public double getResponseVariableAverage()
double
equal to the weighted average value.public int getResponseVariableMostFrequentClass()
VariableType.CATEGORICAL
or
VariableType.ORDERED_DISCRETE
.int
equal to the average value.public PredictiveModel.VariableType getResponseVariableType()
VariableType
of the response variable.public double getTotalWeight()
double
indicating the sum of the active case
weights.public PredictiveModel.VariableType[] getVariableType()
xy
.VariableType
array containing the variable types
in xy
.public double[] getWeights()
double
array containing the case weights.public double[][] getXY()
xy
data.double
matrix containing the training data.public boolean isMustFitModelFlag()
mustFitModel
flag.
When true
, the fitModel()
method
should be called before doing any predictions or other analysis.
boolean
value indicating the state of the flag.public boolean isUserFixedNClasses()
true
if the number of classes was fixed by the user.boolean
value indicating whether or not the number
of classes has been fixed by the user.public abstract double[] predict() throws PredictiveModel.PredictiveModelException
Each PredictiveModel
subclass must override this method.
double
array containing the predicted values.PredictiveModel.PredictiveModelException
- an exception has occurred in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require that exceptions be thrown. Exceptions thrown
from these methods will necessarily extend the
PredictiveModelException
.public abstract double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
Each PredictiveModel
subclass must override this method.
testData
- a double
matrix containing data to be
predicted. testData
must have the same number of columns and
in the same arrangement as xy
(the observations).double
array containing the predicted values.PredictiveModel.PredictiveModelException
- an exception has occurred in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require that exceptions be thrown. Exceptions thrown
from these methods will necessarily extend the
PredictiveModelException
.public abstract double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
Each PredictiveModel
subclass must override this method.
testData
- double
matrix containing data to be
predicted. testData
must have the same number of columns and
in the same arrangement as xy
(the observations).testDataWeights
- a double
array containing weights for
each row of testData
.double
array containing the predicted values.PredictiveModel.PredictiveModelException
- an exception has occurred in the common
PredictiveModel
methods. Implementing or overriding methods
from this class may require that exceptions be thrown. Exceptions thrown
from these methods will necessarily extend the
PredictiveModelException
.public void setClassCounts(double[] classCounts)
Use this method to set the class counts, when one or more classes do not
occur in the training data due to sampling, but are otherwise valid, or
when the data is distributed and the global counts are available. Only
applies when the response variable is of type PredictiveModel.VariableType.CATEGORICAL
or PredictiveModel.VariableType.ORDERED_DISCRETE
.
classCounts
- a double
array containing the class
counts of the response variable.
The default is to use the class counts discovered in the input matrix,
xy
, weighted by the values in weights
.
protected abstract void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException
PredictiveModel
to that of the
input model.
The implementation should include model-specific input methods (subclass
methods) that must necessarily be set. An example would be
problem-specific constraints such as tree depth for a
DecisionTree
in an ensemble
prediction algorithm (see BootstrapAggregation
). Here DecisionTree.setMaxDepth(int)
is called
within this overridden method to constrain the permuted trees to the same
maximum dimensions.
Each PredictiveModel
subclass must override this method.
pm
- a PredictiveModel
object.PredictiveModel.PredictiveModelException
- an
exception class intended to be the parent of all nested Exception classes
where the enclosing class extends PredictiveModel
.public void setCostMatrix(double[][] costMatrix)
costMatrix
- a square double
matrix of dimension
nClasses
by nClasses
containing elements
C(i, j), the cost of misclassifying a response in class j
as in class i. The diagonal elements of the cost matrix must be 0.
Both dimensions of costMatrix
should agree with the number
of classes found in the data. Otherwise an exception will be thrown.
Default: costMatrix[i][j]
=1.0 where and costMatrix[i][i]
=0.0.
public void setFitModelFlag(boolean fitModelFlag)
fitModelFlag
- a boolean
.
Default: fitModelFlag
=true
.
public void setMaxNumberOfCategories(int maxCategories)
maxCategories
- an int
specifying the maximum number of
categories a predictor variable can have.
Default: maxCategories
=10
public void setNumberOfClasses(int nClasses)
PredictiveModel.VariableType.CATEGORICAL
or
PredictiveModel.VariableType.ORDERED_DISCRETE
.nClasses
- an int
representing the number of distinct
classes or categories of the response variable.
An error is generated if more than nClasses
categories are
discovered in the data.
Default: nClasses
is 0.
public void setPredictorIndex(int[] predIdx)
xy
where the predictor
variables reside.
This may be used to subset the full set of predictor variables
(getPredictorTypes()
).
predIdx
- an int
array containing the column index for
each predictor variable.
Default: All columns other than the column containing the response
variable are indicated.public void setPredictorTypes(PredictiveModel.VariableType[] predVarType)
VariableType
objects that correspond to the
predictor data types in xy
.predVarType
- a VariableType
array of length equal to
the number of predictors specifying the data type of each predictor.public void setPrintLevel(int printLevel)
PredictiveModel
.printLevel
- An int
specifying the level of printing to
perform.
printLevel | Action |
0 | No printing. |
1 | Prints final results only. |
2 | Prints intermediate and final results. |
Default: printLevel
= 0.
public void setPriorProbabilities(double[] priors) throws PredictiveModel.SumOfProbabilitiesNotOneException
priors
- a double
array specifying the prior
probabilities of class membership for each class.
The prior probabilities must range between 0.0 and 1.0 inclusive, and sum
to 1.0. The length of priors
should agree with the number of
classes found in the data. Otherwise an exception is thrown. Calling this
method overwrites any existing values.
Default: Determined from the data.
PredictiveModel.SumOfProbabilitiesNotOneException
- prior probabilities must sum to
1.0.public void setRandomObject(Random r)
r
- a Random
object to be used in random permutation of
observation data.
Specifying a seed for the Random
object can produce
repeatable/deterministic output.
public void setWeights(double[] weights)
weights
- a double
array specifying case weights.
Default: weights[i]
= 1.0 for all i.
Copyright © 1970-2015 Rogue Wave Software
Built October 13 2015.