Class PredictiveModel

java.lang.Object
com.imsl.datamining.PredictiveModel
All Implemented Interfaces:
Serializable, Cloneable
Direct Known Subclasses:
DecisionTree, GradientBoosting, LogisticRegression, RandomTrees, SupportVectorMachine

public abstract class PredictiveModel extends Object implements Serializable, Cloneable
Specifies a predictive model. This class defines the members and methods common to predictive models in univariate prediction or classification problems.
See Also:
  • Constructor Details

    • PredictiveModel

      protected PredictiveModel(PredictiveModel pm)
      Constructs a PredictiveModel from an existing instance.
      Parameters:
      pm - an instance of a PredictiveModel
    • PredictiveModel

      protected PredictiveModel(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
      Constructs a PredictiveModel object for a single response variable and multiple predictor variables.

      This constructor should be called by all classes extending PredictiveModel.

      Parameters:
      xy - a double matrix containing the training data and associated response values
      responseColumnIndex - an int specifying the column index in xy of the response variable
      varType - a PredictiveModel.VariableType array of length equal to xy[0].length containing the type of each variable
    • PredictiveModel

      protected PredictiveModel(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType)
      Constructs a PredictiveModel object for a single response variable and multiple predictor variables.

      This constructor may be called by all classes extending PredictiveModel.

      Parameters:
      x - a double matrix containing the training data for the predictor variables
      y - a double matrix containing training data for the response variable. The number of columns will depend on the response variable type.
      predictorVarType - a PredictiveModel.VariableType array of length equal to x[0].length containing the type of each variable
      responseVarType - a PredictiveModel.VariableType, the response variable type
  • Method Details

    • clone

      public abstract PredictiveModel clone()
      Abstract clone method. Each instance of a PredictiveModel must override this method.
      Overrides:
      clone in class Object
      Returns:
      a clone object of a specific PredictiveModel
    • setConfiguration

      protected abstract void setConfiguration(PredictiveModel pm) throws PredictiveModel.PredictiveModelException
      Sets the configuration of PredictiveModel to that of the input model.

      Each instance of a PredictiveModel must override this method. The implementation should use specific class methods to set the parameter settings to that of the input PredictiveModel instance, essentially creating a copy of the input model. This method is used for model parameter tuning such as is done in CrossValidation, where several variations of the same model are evaluated and in ensemble methods, such as BootstrapAggregation, where several identical instances are fit to random samples.

      Each PredictiveModel subclass must override this method.

      Parameters:
      pm - a PredictiveModel object
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when exceptions occur in the enclosing class that extends PredictiveModel.
    • predict

      public abstract double[] predict() throws PredictiveModel.PredictiveModelException
      Predicts the response variable using the most recent fit.

      Each PredictiveModel subclass must override this method.

      Returns:
      a double array containing the predicted values.
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common PredictiveModel methods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend the PredictiveModelException.
    • predict

      public abstract double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
      Predicts the response values using the most recent fit and the provided test data.

      Each PredictiveModel subclass must override this method.

      Parameters:
      testData - a double matrix containing data to be predicted. testData must have the same number of columns in the same arrangement as xy (the observations).
      Returns:
      a double array containing the predicted values.
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common PredictiveModel methods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend the PredictiveModelException.
    • predict

      public abstract double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
      Predicts the response values using the most recent fit, the provided test data, and the test data case weights.

      Each PredictiveModel subclass must override this method.

      Parameters:
      testData - double matrix containing data to be predicted. testData must have the same number of columns in the same arrangement as xy (the observations).
      testDataWeights - a double array containing weights for each row of testData.
      Returns:
      a double array containing the predicted values.
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common PredictiveModel methods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend the PredictiveModelException.
    • fitModel

      public void fitModel() throws PredictiveModel.PredictiveModelException
      Fits the predictive model to the training data (estimates the model using the training data and current configuration settings).

      Subclasses of PredictiveModel, such as DecisionTrees, override this method with specific model fitting algorithms.

      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common PredictiveModel methods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend the PredictiveModelException.
    • getClassErrors

      public int[][] getClassErrors(double[] knownValues, double[] predictedValues)
      Returns classification error information.
      Parameters:
      knownValues - a double array containing the known target classifications
      predictedValues - a double array containing the predicted classifications

      Arrays knownValues and predictedValues must be the same length.

      Returns:
      An int matrix of size (nClasses+1) by 2 containing the number of classification errors and the number of non-missing classifications for each target classification, plus the overall totals for these errors.

      For \(i \lt \, \) nClasses, the i-th row contains the number of classification errors for the i-th class and the number of patterns with non-missing classifications for that class. The last row contains the number of classification errors totaled over all target classifications, and the total number of patterns with non-missing target classifications.

    • getClassErrors

      public int[][] getClassErrors(int[] knownValues, int[] predictedValues)
      Returns classification error information.
      Parameters:
      knownValues - an int array containing the known target classifications
      predictedValues - an int array containing the predicted classifications

      Arrays knownValues and predictedValues must be the same length.

      Returns:
      An int matrix of size (nClasses+1) by 2 containing the number of classification errors and the number of non-missing classifications for each target classification, plus the overall totals for these errors.

      For \(i \lt \, \) nClasses, the i-th row contains the number of classification errors for the i-th class and the number of patterns with non-missing classifications for that class. The last row contains the number of classification errors totaled over all target classifications, and the total number of patterns with non-missing target classifications.

    • getClassCounts

      public double[] getClassCounts()
      Returns the counts of each class (level) of the categorical response variable.

      If the response variable is not PredictiveModel.VariableType.CATEGORICAL nor PredictiveModel.VariableType.ORDERED_DISCRETE, null is returned.

      Returns:
      a double array containing the summation of the case weights for each occurrence of a particular class found in the categorical response data.
    • setClassCounts

      public void setClassCounts(double[] classCounts)
      Sets the counts of each class of the response variable.

      Use this method to set the class counts, when one or more classes do not occur in the training data due to sampling, but are otherwise valid, or when the data is distributed and the global counts are available. Only applies when the response variable is of type PredictiveModel.VariableType.CATEGORICAL or PredictiveModel.VariableType.ORDERED_DISCRETE.

      Parameters:
      classCounts - a double array containing the class counts of the response variable

      The default is to use the class counts discovered in the input matrix, xy, weighted by the values in weights.

    • setClassLabels

      public void setClassLabels(String[] classLabels)
      Sets the class names or labels for a categorical response variable.
      Parameters:
      classLabels - a string array containing class names or labels. The array classLabels must have length = nClasses.

      Default: classLabels = {"1", "2", ...,"K"}, where K = nClasses

    • getClassLabels

      public String[] getClassLabels()
      Returns the current class labels for a categorical response variable.

      Note: The labels will be null unless they have been set using the method setClassLabels.

      Returns:
      a string array containing the labels for each class level
    • setMustFitModel

      public void setMustFitModel(boolean mustFitModel)
      Sets the flag of whether or not the model needs to be fit or re-estimated because of a change in the data or configuration.
      Parameters:
      mustFitModel - a boolean giving the value of the flag

      Default: mustFitModel=true.

    • getCostMatrix

      public double[][] getCostMatrix()
      Returns the cost matrix for a categorical response variable.

      The cost matrix has elements C(i, j) = cost of misclassifying a response in class j as in class i. The diagonal elements of the cost matrix must be 0. In the case that nClasses has not been determined (usually because fitModel() has not been called), an array of length zero is returned.

      Returns:
      a square double matrix of dimension nClasses by nClasses containing the cost matrix for a categorical response variable, where nClasses is the number of classes the response variable may assume.
    • setCostMatrix

      public void setCostMatrix(double[][] costMatrix)
      Specifies the cost matrix for a categorical response variable.
      Parameters:
      costMatrix - a square double matrix of dimension nClasses by nClasses containing elements C(i, j), the cost of misclassifying a response in class j as in class i. The diagonal elements of the cost matrix must be 0.

      Both dimensions of costMatrix should agree with the number of classes found in the data. Otherwise an exception will be thrown.

      Default: costMatrix[i][j]=1.0 where \(i\ne j \) and costMatrix[i][i]=0.0.

    • getMaxNumberOfIterations

      public int getMaxNumberOfIterations()
      Returns the maximum number of iterations allowed for the fitting procedure or training algorithm.
      Returns:
      an int, the maximum number of iterations
    • setMaxNumberOfIterations

      public void setMaxNumberOfIterations(int maxIterations)
      Sets the maximum number of iterations allowed for the fitting procedure or training algorithm.

      Most predictive models use iterative procedures to fit or train the model. Adjusting the maximum number of iterations up or down can assist in diagnosing problems.

      Parameters:
      maxIterations - an int specifying the maximum number of iterations

      Default: maxIterations=1000

    • getMaxNumberOfCategories

      public int getMaxNumberOfCategories()
      Returns the maximum number of categories allowed.
      Returns:
      an int indicating the maximum number of categories allowed within the predictor and response variables.
    • setMaxNumberOfCategories

      public void setMaxNumberOfCategories(int maxCategories)
      Sets the maximum number of categories allowed within categorical predictor and response variables.
      Parameters:
      maxCategories - an int, the maximum number of categories a predictor or response variable can have

      Default: maxCategories=StrictMath.max(10, maxCat + 1), where maxCat is the maximum category within all categorical predictor and response variables.

    • getNumberOfClasses

      public int getNumberOfClasses()
      Returns the number of distinct classes found (or set) in the categorical response data.
      Returns:
      an int, the number of classes
    • setNumberOfClasses

      public void setNumberOfClasses(int nClasses)
      Sets the number of distinct classes or categories the response variable may assume.

      It will not have an effect for response variable type PredictiveModel.VariableType.QUANTITATIVE_CONTINUOUS.

      Parameters:
      nClasses - an int, the number of distinct classes or categories of the response variable

      An error is generated if more than nClasses categories are discovered in the data.

      Default: nClasses is 0.

    • getNumberOfColumns

      public int getNumberOfColumns()
      Returns the number of columns in the training data xy.
      Returns:
      an int, the number of columns in xy. If xy is null, nCols=0.
    • getNumberOfMissing

      public int getNumberOfMissing()
      Returns the number of missing values of the response variable found in the data xy.
      Returns:
      an int, the number of missing values
    • getNumberOfPredictors

      public int getNumberOfPredictors()
      Returns the number of predictors.
      Returns:
      an int, the number of predictors
    • getNumberOfRows

      public int getNumberOfRows()
      Returns the number of rows (observations) in the training data.
      Returns:
      an int, the number of rows (observations) in the training data
    • getPredictorIndexes

      public int[] getPredictorIndexes()
      Returns the column indices of xy in which the predictor variables reside.
      Returns:
      an int array containing the column indices
    • setPredictorIndex

      public void setPredictorIndex(int[] predIdx)
      Sets the column indices of xy in which the predictor variables reside.

      This may be used to subset the full set of predictor variables (getPredictorTypes()).

      Parameters:
      predIdx - an int array containing the column index for each predictor variable Default: All columns other than the column containing the response variable are indicated.
    • getPredictorTypes

      public PredictiveModel.VariableType[] getPredictorTypes()
      Returns an array of VariableType objects that correspond to the predictor data types in xy.
      Returns:
      a VariableType array that corresponds to the predictor data types in xy
    • setPredictorTypes

      public void setPredictorTypes(PredictiveModel.VariableType[] predVarType)
      Sets the VariableType objects that correspond to the predictor data types in xy.
      Parameters:
      predVarType - a VariableType array of length equal to the number of predictors specifying the data type of each predictor
    • setRandomObject

      public void setRandomObject(Random r)
      Sets the random object to be used in the permutation of observation data.
      Parameters:
      r - a Random object to be used in the random permutation of observation data

      Specifying a seed for the Random object can produce repeatable/deterministic output.

    • getRandomObject

      public Random getRandomObject()
      Returns the random object being used in the permutation of the observations.
      Returns:
      a Random object being used for permutations
    • getNumberOfUniquePredictorValues

      public int[] getNumberOfUniquePredictorValues()
      Returns an array containing the number of distinct values of each predictor found in the input data.

      For continuous predictor variables, the value is set to 0 and is not meaningful.

      Returns:
      an int array containing the number of distinct values for each predictor
    • getPrintLevel

      public int getPrintLevel()
      Returns the current print level.
      Returns:
      an int, the current print level

      printLevel Action
      0 No printing.
      1 Prints final results only.
      2 Prints intermediate and final results.

      Default: printLevel = 0.
    • setPrintLevel

      public void setPrintLevel(int printLevel)
      Sets the print level for a PredictiveModel.
      Parameters:
      printLevel - an int specifying the level of printing to perform
      printLevel Action
      0 No printing.
      1 Prints final results only.
      2 Prints intermediate and final results.

      Default: printLevel = 0.

    • getClassProbabilities

      public double[][] getClassProbabilities()
      Returns a matrix containing the predicted class probabilities for each observation in the training data
      Returns:
      a double matrix containing the class probabilities
    • setClassProbabilities

      public void setClassProbabilities(double[][] probs) throws PredictiveModel.SumOfProbabilitiesNotOneException
      Sets the class probabilities.
      Parameters:
      probs - a double matrix specifying class probabilities for each pattern or observation in a data set

      The probabilities must range between 0.0 and 1.0 inclusive, and sum to 1.0. The number of columns in probs should agree with the number of classes found in the data. Otherwise an exception is thrown. Calling this method overwrites any existing values.

      Default: probs=null unless estimated by an overriding method or set by the user.

      Throws:
      PredictiveModel.SumOfProbabilitiesNotOneException - is thrown when class probabilities do not sum to 1.0.
    • getPriorProbabilities

      public double[] getPriorProbabilities()
      Returns an array containing the prior probabilities.
      Returns:
      a double array containing the prior probabilities
    • setPriorProbabilities

      public void setPriorProbabilities(double[] priors) throws PredictiveModel.SumOfProbabilitiesNotOneException
      Sets the prior probabilities for class membership.
      Parameters:
      priors - a double array specifying the prior probabilities

      The prior probabilities must range between 0.0 and 1.0 inclusive, and sum to 1.0. The length of priors should agree with the number of classes found in the data. Otherwise an exception is thrown. Calling this method overwrites any existing values.

      Default: Determined from the data.

      Throws:
      PredictiveModel.SumOfProbabilitiesNotOneException - is thrown when prior probabilities do not sum to 1.0.
    • getResponseColumnIndex

      public int getResponseColumnIndex()
      Returns the column index in xy containing the response variable.
      Returns:
      an int, the column index for the response variable
    • setResponseColumnIndex

      public void setResponseColumnIndex(int index)
      Sets the column index in xy containing the response variable.
      Parameters:
      index - an int, the column index for the response variable
    • getResponseVariableAverage

      public double getResponseVariableAverage()
      Returns the weighted average value of the response variable.
      Returns:
      a double, the weighted average value of the response variable
    • getResponseVariableMostFrequentClass

      public int getResponseVariableMostFrequentClass()
      Returns the most frequent value of the response variable. Only meaningful for VariableType.CATEGORICAL or VariableType.ORDERED_DISCRETE.
      Returns:
      an int, the level of the most frequent class
    • getResponseVariableType

      public PredictiveModel.VariableType getResponseVariableType()
      Returns the variable type of the response variable.
      Returns:
      the VariableType of the response variable
    • getTotalWeight

      public double getTotalWeight()
      Returns the sum of the active case weights.
      Returns:
      a double, the sum of the active case weights
    • getVariableType

      public PredictiveModel.VariableType[] getVariableType()
      Returns an array containing the variable types in xy.
      Returns:
      a VariableType array containing the variable types in xy
    • setVariableType

      public void setVariableType(PredictiveModel.VariableType[] varType)
      Sets the variable types for the data.
      Parameters:
      varType - a PredictiveModel.VariableType array of length equal to xy[0].length containing the type of each variable
    • getWeights

      public double[] getWeights()
      Returns an array containing the case weights.
      Returns:
      a double array containing the case weights
    • setWeights

      public void setWeights(double[] weights)
      Specifies the case weights.
      Parameters:
      weights - a double array specifying case weights

      Default: weights[i] = 1.0 for all i.

    • setTrainingData

      public void setTrainingData(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
      Sets up the training data for the predictive model.

      By calling this method, the problem is either initialized or reset to use the data in the arguments.

      Parameters:
      xy - a double matrix containing the training data and associated response values
      responseColumnIndex - an int specifying the column index in xy of the response variable
      varType - a PredictiveModel.VariableType array of length equal to xy[0].length containing the type of each variable
    • getXY

      public double[][] getXY()
      Returns a copy of the xy data.
      Returns:
      a double matrix containing the training data
    • isMustFitModel

      public boolean isMustFitModel()
      Returns the current value of the mustFitModel flag.

      When true, the fitModel() method should be called before doing any predictions or other analysis.

      Returns:
      a boolean, the current state of the flag
    • isConstantSeries

      public boolean isConstantSeries()
      Returns the current value of the constantSeries flag.

      The flag is set to true if the code determines that the response variable is constant in the training data. The method fitModel will fail if the series is constant. The flag will be reset if the training data is changed using setTrainingData, and the response variable is not constant.

      Returns:
      a boolean, the current state of the flag
    • isUserFixedNClasses

      public boolean isUserFixedNClasses()
      Returns true if the number of classes was fixed by the user.
      Returns:
      a boolean, the current state of the flag