Class PredictiveModel
- All Implemented Interfaces:
Serializable,Cloneable
- Direct Known Subclasses:
DecisionTree,GradientBoosting,LogisticRegression,RandomTrees,SupportVectorMachine
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classWraps thejava.lang.CloneNotSupportedExceptionto indicate that theclonemethod in classObjecthas been called to clone an object, but that the object's class does not implement theCloneableinterface.static classAn exception class intended to be the parent of all nested Exception classes where the enclosing class extendsPredictiveModel.static classException thrown when an input parameter has changed that might affect the model estimates or predictions.static classException thrown when the sum of probabilities is not approximately one.static enumEnumerates different variable types. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedPredictiveModel(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType) Constructs aPredictiveModelobject for a single response variable and multiple predictor variables.protectedPredictiveModel(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs aPredictiveModelobject for a single response variable and multiple predictor variables.protectedConstructs aPredictiveModelfrom an existing instance. -
Method Summary
Modifier and TypeMethodDescriptionabstract PredictiveModelclone()Abstract clone method.voidfitModel()Fits the predictive model to the training data (estimates the model using the training data and current configuration settings).double[]Returns the counts of each class (level) of the categorical response variable.int[][]getClassErrors(double[] knownValues, double[] predictedValues) Returns classification error information.int[][]getClassErrors(int[] knownValues, int[] predictedValues) Returns classification error information.String[]Returns the current class labels for a categorical response variable.double[][]Returns a matrix containing the predicted class probabilities for each observation in the training datadouble[][]Returns the cost matrix for a categorical response variable.intReturns the maximum number of categories allowed.intReturns the maximum number of iterations allowed for the fitting procedure or training algorithm.intReturns the number of distinct classes found (or set) in the categorical response data.intReturns the number of columns in the training dataxy.intReturns the number of missing values of the response variable found in the dataxy.intReturns the number of predictors.intReturns the number of rows (observations) in the training data.int[]Returns an array containing the number of distinct values of each predictor found in the input data.int[]Returns the column indices ofxyin which the predictor variables reside.Returns an array ofVariableTypeobjects that correspond to the predictor data types inxy.intReturns the current print level.double[]Returns an array containing the prior probabilities.Returns the random object being used in the permutation of the observations.intReturns the column index inxycontaining the response variable.doubleReturns the weighted average value of the response variable.intReturns the most frequent value of the response variable.Returns the variable type of the response variable.doubleReturns the sum of the active case weights.Returns an array containing the variable types inxy.double[]Returns an array containing the case weights.double[][]getXY()Returns a copy of thexydata.booleanReturns the current value of theconstantSeriesflag.booleanReturns the current value of themustFitModelflag.booleanReturnstrueif the number of classes was fixed by the user.abstract double[]predict()Predicts the response variable using the most recent fit.abstract double[]predict(double[][] testData) Predicts the response values using the most recent fit and the provided test data.abstract double[]predict(double[][] testData, double[] testDataWeights) Predicts the response values using the most recent fit, the provided test data, and the test data case weights.voidsetClassCounts(double[] classCounts) Sets the counts of each class of the response variable.voidsetClassLabels(String[] classLabels) Sets the class names or labels for a categorical response variable.voidsetClassProbabilities(double[][] probs) Sets the class probabilities.protected abstract voidSets the configuration ofPredictiveModelto that of the input model.voidsetCostMatrix(double[][] costMatrix) Specifies the cost matrix for a categorical response variable.voidsetMaxNumberOfCategories(int maxCategories) Sets the maximum number of categories allowed within categorical predictor and response variables.voidsetMaxNumberOfIterations(int maxIterations) Sets the maximum number of iterations allowed for the fitting procedure or training algorithm.voidsetMustFitModel(boolean mustFitModel) Sets the flag of whether or not the model needs to be fit or re-estimated because of a change in the data or configuration.voidsetNumberOfClasses(int nClasses) Sets the number of distinct classes or categories the response variable may assume.voidsetPredictorIndex(int[] predIdx) Sets the column indices ofxyin which the predictor variables reside.voidsetPredictorTypes(PredictiveModel.VariableType[] predVarType) Sets theVariableTypeobjects that correspond to the predictor data types inxy.voidsetPrintLevel(int printLevel) Sets the print level for aPredictiveModel.voidsetPriorProbabilities(double[] priors) Sets the prior probabilities for class membership.voidSets the random object to be used in the permutation of observation data.voidsetResponseColumnIndex(int index) Sets the column index inxycontaining the response variable.voidsetTrainingData(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Sets up the training data for the predictive model.voidsetVariableType(PredictiveModel.VariableType[] varType) Sets the variable types for the data.voidsetWeights(double[] weights) Specifies the case weights.
-
Constructor Details
-
PredictiveModel
Constructs aPredictiveModelfrom an existing instance.- Parameters:
pm- an instance of aPredictiveModel
-
PredictiveModel
protected PredictiveModel(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs aPredictiveModelobject for a single response variable and multiple predictor variables.This constructor should be called by all classes extending
PredictiveModel.- Parameters:
xy- adoublematrix containing the training data and associated response valuesresponseColumnIndex- anintspecifying the column index inxyof the response variablevarType- aPredictiveModel.VariableTypearray of length equal toxy[0].lengthcontaining the type of each variable
-
PredictiveModel
protected PredictiveModel(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType) Constructs aPredictiveModelobject for a single response variable and multiple predictor variables.This constructor may be called by all classes extending
PredictiveModel.- Parameters:
x- adoublematrix containing the training data for the predictor variablesy- adoublematrix containing training data for the response variable. The number of columns will depend on the response variable type.predictorVarType- aPredictiveModel.VariableTypearray of length equal tox[0].lengthcontaining the type of each variableresponseVarType- aPredictiveModel.VariableType, the response variable type
-
-
Method Details
-
clone
Abstract clone method. Each instance of aPredictiveModelmust override this method. -
setConfiguration
protected abstract void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException Sets the configuration ofPredictiveModelto that of the input model.Each instance of a
PredictiveModelmust override this method. The implementation should use specific class methods to set the parameter settings to that of the inputPredictiveModelinstance, essentially creating a copy of the input model. This method is used for model parameter tuning such as is done inCrossValidation, where several variations of the same model are evaluated and in ensemble methods, such as BootstrapAggregation, where several identical instances are fit to random samples.Each
PredictiveModelsubclass must override this method.- Parameters:
pm- aPredictiveModelobject- Throws:
PredictiveModel.PredictiveModelException- is thrown when exceptions occur in the enclosing class that extendsPredictiveModel.
-
predict
Predicts the response variable using the most recent fit.Each
PredictiveModelsubclass must override this method.- Returns:
- a
doublearray containing the predicted values. - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the commonPredictiveModelmethods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend thePredictiveModelException.
-
predict
public abstract double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException Predicts the response values using the most recent fit and the provided test data.Each
PredictiveModelsubclass must override this method.- Parameters:
testData- adoublematrix containing data to be predicted.testDatamust have the same number of columns in the same arrangement asxy(the observations).- Returns:
- a
doublearray containing the predicted values. - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the commonPredictiveModelmethods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend thePredictiveModelException.
-
predict
public abstract double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException Predicts the response values using the most recent fit, the provided test data, and the test data case weights.Each
PredictiveModelsubclass must override this method.- Parameters:
testData-doublematrix containing data to be predicted.testDatamust have the same number of columns in the same arrangement asxy(the observations).testDataWeights- adoublearray containing weights for each row oftestData.- Returns:
- a
doublearray containing the predicted values. - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the commonPredictiveModelmethods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend thePredictiveModelException.
-
fitModel
Fits the predictive model to the training data (estimates the model using the training data and current configuration settings).Subclasses of
PredictiveModel, such asDecisionTrees, override this method with specific model fitting algorithms.- Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the commonPredictiveModelmethods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend thePredictiveModelException.
-
getClassErrors
public int[][] getClassErrors(double[] knownValues, double[] predictedValues) Returns classification error information.- Parameters:
knownValues- adoublearray containing the known target classificationspredictedValues- adoublearray containing the predicted classificationsArrays
knownValuesandpredictedValuesmust be the same length.- Returns:
- An
intmatrix of size(nClasses+1)by 2 containing the number of classification errors and the number of non-missing classifications for each target classification, plus the overall totals for these errors.For \(i \lt \, \)
nClasses, the i-th row contains the number of classification errors for the i-th class and the number of patterns with non-missing classifications for that class. The last row contains the number of classification errors totaled over all target classifications, and the total number of patterns with non-missing target classifications.
-
getClassErrors
public int[][] getClassErrors(int[] knownValues, int[] predictedValues) Returns classification error information.- Parameters:
knownValues- anintarray containing the known target classificationspredictedValues- anintarray containing the predicted classificationsArrays
knownValuesandpredictedValuesmust be the same length.- Returns:
- An
intmatrix of size(nClasses+1)by 2 containing the number of classification errors and the number of non-missing classifications for each target classification, plus the overall totals for these errors.For \(i \lt \, \)
nClasses, the i-th row contains the number of classification errors for the i-th class and the number of patterns with non-missing classifications for that class. The last row contains the number of classification errors totaled over all target classifications, and the total number of patterns with non-missing target classifications.
-
getClassCounts
public double[] getClassCounts()Returns the counts of each class (level) of the categorical response variable.If the response variable is not
PredictiveModel.VariableType.CATEGORICALnorPredictiveModel.VariableType.ORDERED_DISCRETE,nullis returned.- Returns:
- a
doublearray containing the summation of the case weights for each occurrence of a particular class found in the categorical response data.
-
setClassCounts
public void setClassCounts(double[] classCounts) Sets the counts of each class of the response variable.Use this method to set the class counts, when one or more classes do not occur in the training data due to sampling, but are otherwise valid, or when the data is distributed and the global counts are available. Only applies when the response variable is of type
PredictiveModel.VariableType.CATEGORICALorPredictiveModel.VariableType.ORDERED_DISCRETE.- Parameters:
classCounts- adoublearray containing the class counts of the response variableThe default is to use the class counts discovered in the input matrix,
xy, weighted by the values inweights.
-
setClassLabels
Sets the class names or labels for a categorical response variable.- Parameters:
classLabels- astringarray containing class names or labels. The arrayclassLabelsmust have length =nClasses.Default:
classLabels= {"1", "2", ...,"K"}, where K =nClasses
-
getClassLabels
Returns the current class labels for a categorical response variable.Note: The labels will be null unless they have been set using the method
setClassLabels.- Returns:
- a
stringarray containing the labels for each class level
-
setMustFitModel
public void setMustFitModel(boolean mustFitModel) Sets the flag of whether or not the model needs to be fit or re-estimated because of a change in the data or configuration.- Parameters:
mustFitModel- abooleangiving the value of the flagDefault:
mustFitModel=true.
-
getCostMatrix
public double[][] getCostMatrix()Returns the cost matrix for a categorical response variable.The cost matrix has elements C(i, j) = cost of misclassifying a response in class j as in class i. The diagonal elements of the cost matrix must be 0. In the case that
nClasseshas not been determined (usually becausefitModel()has not been called), an array of length zero is returned.- Returns:
- a square
doublematrix of dimensionnClassesbynClassescontaining the cost matrix for a categorical response variable, wherenClassesis the number of classes the response variable may assume.
-
setCostMatrix
public void setCostMatrix(double[][] costMatrix) Specifies the cost matrix for a categorical response variable.- Parameters:
costMatrix- a squaredoublematrix of dimensionnClassesbynClassescontaining elements C(i, j), the cost of misclassifying a response in class j as in class i. The diagonal elements of the cost matrix must be 0.Both dimensions of
costMatrixshould agree with the number of classes found in the data. Otherwise an exception will be thrown.Default:
costMatrix[i][j]=1.0 where \(i\ne j \) andcostMatrix[i][i]=0.0.
-
getMaxNumberOfIterations
public int getMaxNumberOfIterations()Returns the maximum number of iterations allowed for the fitting procedure or training algorithm.- Returns:
- an
int, the maximum number of iterations
-
setMaxNumberOfIterations
public void setMaxNumberOfIterations(int maxIterations) Sets the maximum number of iterations allowed for the fitting procedure or training algorithm.Most predictive models use iterative procedures to fit or train the model. Adjusting the maximum number of iterations up or down can assist in diagnosing problems.
- Parameters:
maxIterations- anintspecifying the maximum number of iterationsDefault:
maxIterations=1000
-
getMaxNumberOfCategories
public int getMaxNumberOfCategories()Returns the maximum number of categories allowed.- Returns:
- an
intindicating the maximum number of categories allowed within the predictor and response variables.
-
setMaxNumberOfCategories
public void setMaxNumberOfCategories(int maxCategories) Sets the maximum number of categories allowed within categorical predictor and response variables.- Parameters:
maxCategories- anint, the maximum number of categories a predictor or response variable can haveDefault:
maxCategories=StrictMath.max(10, maxCat + 1), wheremaxCatis the maximum category within all categorical predictor and response variables.
-
getNumberOfClasses
public int getNumberOfClasses()Returns the number of distinct classes found (or set) in the categorical response data.- Returns:
- an
int, the number of classes
-
setNumberOfClasses
public void setNumberOfClasses(int nClasses) Sets the number of distinct classes or categories the response variable may assume.It will not have an effect for response variable type
PredictiveModel.VariableType.QUANTITATIVE_CONTINUOUS.- Parameters:
nClasses- anint, the number of distinct classes or categories of the response variableAn error is generated if more than
nClassescategories are discovered in the data.Default:
nClassesis 0.
-
getNumberOfColumns
public int getNumberOfColumns()Returns the number of columns in the training dataxy.- Returns:
- an
int, the number of columns inxy. Ifxyisnull,nCols=0.
-
getNumberOfMissing
public int getNumberOfMissing()Returns the number of missing values of the response variable found in the dataxy.- Returns:
- an
int, the number of missing values
-
getNumberOfPredictors
public int getNumberOfPredictors()Returns the number of predictors.- Returns:
- an
int, the number of predictors
-
getNumberOfRows
public int getNumberOfRows()Returns the number of rows (observations) in the training data.- Returns:
- an
int, the number of rows (observations) in the training data
-
getPredictorIndexes
public int[] getPredictorIndexes()Returns the column indices ofxyin which the predictor variables reside.- Returns:
- an
intarray containing the column indices
-
setPredictorIndex
public void setPredictorIndex(int[] predIdx) Sets the column indices ofxyin which the predictor variables reside.This may be used to subset the full set of predictor variables (
getPredictorTypes()).- Parameters:
predIdx- anintarray containing the column index for each predictor variable Default: All columns other than the column containing the response variable are indicated.
-
getPredictorTypes
Returns an array ofVariableTypeobjects that correspond to the predictor data types inxy.- Returns:
- a
VariableTypearray that corresponds to the predictor data types inxy
-
setPredictorTypes
Sets theVariableTypeobjects that correspond to the predictor data types inxy.- Parameters:
predVarType- aVariableTypearray of length equal to the number of predictors specifying the data type of each predictor
-
setRandomObject
Sets the random object to be used in the permutation of observation data.- Parameters:
r- aRandomobject to be used in the random permutation of observation dataSpecifying a seed for the
Randomobject can produce repeatable/deterministic output.
-
getRandomObject
Returns the random object being used in the permutation of the observations.- Returns:
- a
Randomobject being used for permutations
-
getNumberOfUniquePredictorValues
public int[] getNumberOfUniquePredictorValues()Returns an array containing the number of distinct values of each predictor found in the input data.For continuous predictor variables, the value is set to 0 and is not meaningful.
- Returns:
- an
intarray containing the number of distinct values for each predictor
-
getPrintLevel
public int getPrintLevel()Returns the current print level.- Returns:
- an
int, the current print level
Default:printLevel Action 0 No printing. 1 Prints final results only. 2 Prints intermediate and final results. printLevel= 0.
-
setPrintLevel
public void setPrintLevel(int printLevel) Sets the print level for aPredictiveModel.- Parameters:
printLevel- anintspecifying the level of printing to performprintLevel Action 0 No printing. 1 Prints final results only. 2 Prints intermediate and final results. Default:
printLevel= 0.
-
getClassProbabilities
public double[][] getClassProbabilities()Returns a matrix containing the predicted class probabilities for each observation in the training data- Returns:
- a
doublematrix containing the class probabilities
-
setClassProbabilities
public void setClassProbabilities(double[][] probs) throws PredictiveModel.SumOfProbabilitiesNotOneException Sets the class probabilities.- Parameters:
probs- adoublematrix specifying class probabilities for each pattern or observation in a data setThe probabilities must range between 0.0 and 1.0 inclusive, and sum to 1.0. The number of columns in
probsshould agree with the number of classes found in the data. Otherwise an exception is thrown. Calling this method overwrites any existing values.Default:
probs=nullunless estimated by an overriding method or set by the user.- Throws:
PredictiveModel.SumOfProbabilitiesNotOneException- is thrown when class probabilities do not sum to 1.0.
-
getPriorProbabilities
public double[] getPriorProbabilities()Returns an array containing the prior probabilities.- Returns:
- a
doublearray containing the prior probabilities
-
setPriorProbabilities
public void setPriorProbabilities(double[] priors) throws PredictiveModel.SumOfProbabilitiesNotOneException Sets the prior probabilities for class membership.- Parameters:
priors- adoublearray specifying the prior probabilitiesThe prior probabilities must range between 0.0 and 1.0 inclusive, and sum to 1.0. The length of
priorsshould agree with the number of classes found in the data. Otherwise an exception is thrown. Calling this method overwrites any existing values.Default: Determined from the data.
- Throws:
PredictiveModel.SumOfProbabilitiesNotOneException- is thrown when prior probabilities do not sum to 1.0.
-
getResponseColumnIndex
public int getResponseColumnIndex()Returns the column index inxycontaining the response variable.- Returns:
- an
int, the column index for the response variable
-
setResponseColumnIndex
public void setResponseColumnIndex(int index) Sets the column index inxycontaining the response variable.- Parameters:
index- anint, the column index for the response variable
-
getResponseVariableAverage
public double getResponseVariableAverage()Returns the weighted average value of the response variable.- Returns:
- a
double, the weighted average value of the response variable
-
getResponseVariableMostFrequentClass
public int getResponseVariableMostFrequentClass()Returns the most frequent value of the response variable. Only meaningful forVariableType.CATEGORICALorVariableType.ORDERED_DISCRETE.- Returns:
- an
int, the level of the most frequent class
-
getResponseVariableType
Returns the variable type of the response variable.- Returns:
- the
VariableTypeof the response variable
-
getTotalWeight
public double getTotalWeight()Returns the sum of the active case weights.- Returns:
- a
double, the sum of the active case weights
-
getVariableType
Returns an array containing the variable types inxy.- Returns:
- a
VariableTypearray containing the variable types inxy
-
setVariableType
Sets the variable types for the data.- Parameters:
varType- aPredictiveModel.VariableTypearray of length equal toxy[0].lengthcontaining the type of each variable
-
getWeights
public double[] getWeights()Returns an array containing the case weights.- Returns:
- a
doublearray containing the case weights
-
setWeights
public void setWeights(double[] weights) Specifies the case weights.- Parameters:
weights- adoublearray specifying case weightsDefault:
weights[i]= 1.0 for all i.
-
setTrainingData
public void setTrainingData(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Sets up the training data for the predictive model.By calling this method, the problem is either initialized or reset to use the data in the arguments.
- Parameters:
xy- adoublematrix containing the training data and associated response valuesresponseColumnIndex- anintspecifying the column index inxyof the response variablevarType- aPredictiveModel.VariableTypearray of length equal toxy[0].lengthcontaining the type of each variable
-
getXY
public double[][] getXY()Returns a copy of thexydata.- Returns:
- a
doublematrix containing the training data
-
isMustFitModel
public boolean isMustFitModel()Returns the current value of themustFitModelflag.When
true, thefitModel()method should be called before doing any predictions or other analysis.- Returns:
- a
boolean, the current state of the flag
-
isConstantSeries
public boolean isConstantSeries()Returns the current value of theconstantSeriesflag.The flag is set to
trueif the code determines that the response variable is constant in the training data. The methodfitModelwill fail if the series is constant. The flag will be reset if the training data is changed usingsetTrainingData, and the response variable is not constant.- Returns:
- a
boolean, the current state of the flag
-
isUserFixedNClasses
public boolean isUserFixedNClasses()Returnstrueif the number of classes was fixed by the user.- Returns:
- a
boolean, the current state of the flag
-