com.imsl.datamining.LogisticRegression

All Implemented Interfaces:: Serializable, Cloneable

public class LogisticRegression extends PredictiveModel implements Serializable, Cloneable

Performs binomial or multinomial logistic regression.

This function fits a logistic regression model for discrete dependent variables with two or more mutually exclusive outcomes or classes. For a binary response $y$, the objective is to model the conditional probability of success, $\pi_1 (x) = Pr[y = 1∣ x]$, where $x = (x_1, x_2, …, x_p)$ is a realization of $p$ independent variables. Logistic regression models the conditional probability $\pi_1 (x)$ using the cdf of the logistic distribution. In particular, $$ \pi_1(x) = \frac{\exp(\eta_1)}{1+\exp(\eta_1)} $$ where the linear predictor $$\eta_1 = \beta_{01} + x_1\beta_{11} + x_2\beta_{21} + \cdots + x_p\beta_{p1} $$ The coefficients $\{\beta_{01},\beta_{11},\ldots,\beta_{p1}\}$ are unknown and are estimated using the training data.

Solving for the linear component $\eta_1$ results in the log-odds or logit transformation of $\pi_1 (x)$: $$ \text{logit}(\pi_1(x)) = \text{log}\frac{\pi_1(x)}{1-\pi_1(x)} = \eta_1 $$ Given a set of $N$ observations, $\{y_i, x_i\}$, where $y_i$ follows a binomial $(n, \pi) $ distribution with parameters $n = 1$ and $\pi = \pi_1 (x_i)$, the likelihood and log-likelihood are, respectively, $$ L = \prod_{i=1}^N{{\pi_1(x_i)}^{y_i}{(1-\pi_1(x_i))}^{1-y_i}}$$ $$ \mathcal{L} = \log(L) = \sum_{i=1}^{N} \left[y_i \text{logit}(\pi_1 (x_i))+ \log(1-\pi_1 (x_i))\right]$$ The log-likelihood in terms of the $\eta_1$ and the unknown parameters $\{\beta_{01},\beta_1\}$ is therefore $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^{N} \left[y_i \eta_1(x_i) - \log(1+\exp(\eta_1(x_i)))\right]$$ where $\eta_1(x_i) = \beta_{01} + x_i^{T}\beta_{1}$. With a binary outcome, only one probability needs to be modeled. The second probability can be obtained from the constraint, $\pi_1(x) + \pi_2(x) = 1$. If each $y_i$ is the number of successes in $n_i \gt 1$ independent trials, the log-likelihood becomes $$ \mathcal{L} = \sum_{i=1}^{N} \left[ y_i \text{logit}(\pi_1 (x_i))+ n_i\log(1-\pi_1 (x_i)) \right]$$ or $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^N \left[y_i \eta_1(x_i) - n_i\log(1+\exp(\eta_1(x_i)))\right]$$ The parameters are estimated using the method of maximum likelihood. To test the significance of the model, the log-likelihood of the fitted model is compared to that of an intercept‑only model. In particular, $G = -2(\mathcal{L}(\beta_{01}) - \mathcal{L}(\beta_{01},\beta_1))$ is a likelihood-ratio test statistic and under the null hypothesis, $H_0 : \beta_{11} = \beta_{12} = \ldots = \beta_{1p} = 0$, $G$ is distributed as chi-squared with $p - 1$ degrees of freedom. A significant result suggests that at least one parameter in the model is non‑zero. See Hosmer and Lemeshow (2000) for further discussion.

In the multinomial case, the response vector is $y_i = (y_{i1}, y_{i2},\ldots, y_{iK})^T$, where $y_{ik} = 1$ when the $i$-th observation belongs to class $k$ and $y_{ik} = 0$, otherwise. Furthermore, because the outcomes are mutually exclusive, $\sum_{k=1}^{K} y_{ik} = 1$ and $\pi_1(x) + \pi_2(x) + \cdots + \pi_K(x) = 1$. Without loss of generality the last class $K$ serves as the baseline or reference class in the sense that it is not modeled directly but found from $\pi_K(x) = 1 - \sum_{k=1}^{K-1} \pi_k(x) $. Note that if there are multiple trials $n_i > 1$, the constraint on the responses becomes $\sum_{k=1}^{K} y_{ik} = n_i$.

The log-likelihood in the multinomial case is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K} = \sum_{i=1}^N \left[ \sum_{l=1}^{K} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K} \exp(\eta_{il})\right ) \right ] $$ where $ \eta_{il} = \beta_{0l} + x_i^T\beta_{l} $.

With $\pi_l(x_i) = \frac{\exp(\eta_{il})}{\sum_{k=1}^{K}\exp(\eta_{ik})}$, the constraint $\sum_{l=1}^{K} \pi_l(x_i) = 1 $ is handled by setting $\eta_{iK}=0$. Then the log-likelihood is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K-1} = \sum_{i=1}^N \left[ \sum_{l=1}^{K-1} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K-1} \exp(\eta_{il})\right ) \right ] $$ Note that the logit for class $k$ is $$\log(\frac{\pi_{ik}}{\pi_{iK}}) = \eta_{ik} $$ which is the log-odds of class $k$ versus the reference class $K$.

Maximum likelihood estimates are found by solving the score equation for each parameter $$\begin{array}{l} \frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}} & = \sum_{i=1}^N \left [x_{ij}y_{il}-\frac{x_{ij}\exp(\eta_{il})}{1+\sum_{k=1}^{K-1}\exp(\eta_{ik})} \right]\\ & = \sum_{i=1}^N x_{ij}(y_{il} - \pi_{il}) \\ & = 0 \end{array} $$ To solve the score equations, the function employs a method known as iteratively re-weighted least squares or IRLS. In this case the IRLS is equivalent to the Newton-Raphson algorithm (Hastie, et. al., 2009, Thisted, 1988).

Consider the full vector of parameters $$ \beta = (\beta_{01},\beta_{11}, \beta_{21},\ldots,\beta_{p1},\beta_{02},\ldots,\beta_{p2},\ldots, \beta_{0K-1},\beta_{1K-1},\ldots,\beta_{pK-1})^T$$ The Newton-Raphson step at iteration $n+1$ is $$\beta^{n+1} = \beta^n - \matrix{H}^{-1}(\beta^n)\matrix{G}(\beta^n)$$ where $\matrix{H}$ is the Hessian matrix, the matrix of the second partial derivatives defined by $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{kl}}\partial{\beta_{jl}}} & = -\sum_{i=1}^N \left [ \frac{x_{ij}x_{ik}\exp(\eta_{il})\sum_{m=1}^K \exp(\eta_{im}) - x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{il})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = -\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}(1-\pi_{il}) \end{array} $$ for $l=1,2,\ldots, K$, $k,j = 1,2,\ldots,p$, and $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{k\nu}}\partial{\beta_{jl}}} & = -\frac{1}{N}\sum_{i=1}^N \left [ -\frac{x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{i\nu})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = \frac{1}{N}\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}\pi_{i\nu} \end{array} $$ where $l \ne \nu =1,2,\ldots,K $, and $k,j = 1,2,\ldots,p$. The $\matrix{G}$ denotes the the gradient, the vector of the first partial derivatives of the log-likelihood $$\frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}}. $$

Both the gradient and the Hessian are evaluated at the most recent estimate of the parameters, $\beta^n$. The iteration continues until convergence or until maximum iterations are reached. Following the theory of maximum likelihood estimation (Kendall and Stuart, 1979), standard errors are obtained from Fisher’s information matrix $ \matrix{I}= -\matrix{H}^{-1} $ evaluated at the final estimates.

See Also:

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType
Constructor Summary

Constructors

Constructor

Description

LogisticRegression(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType)

Constructs a logistic regression predictive model.

LogisticRegression(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)

Constructs a logistic regression predictive model.

LogisticRegression(LogisticRegression lrModel)

Constructs a copy of the input LogisticRegression predictive model.
Method Summary

Modifier and Type

Method

Description

LogisticRegression

clone()

Clones a LogisticRegression predictive model.

void

fitModel()

Fits the logistic regression predictive model.

double[]

getClassMeans()

Returns the class means.

double[]

getCoefficients()

Returns the current values of the coefficient estimates.

double[][]

getConfidenceIntervals()

Returns the confidence intervals for the predictions.

double

getConfidenceLevel()

Returns the confidence level for calculating the confidence limits for the predictions.

double[]

getGradient()

Returns the gradient.

double[][]

getHessian()

Returns the Hessian matrix.

double

getInitialLoglikelihood()

Returns the log-likelihood of the model evaluated at the starting coefficients.

double

getLoglikelihood()

Returns the log-likelihood evaluated at the estimated coefficients.

double

getLogLikelihoodRatio()

Returns the log-likelihood ratio statistic.

int

getNumberOfCoefficients()

Returns the number of coefficients (per class).

double[]

getStandardErrors()

Returns the standard errors of the estimated coefficients.

int[]

getXInteractionsIndex()

Returns the array containing the indices specifying interaction terms.

boolean

isIncludeIntercept()

Returns the current setting of the flag to include or not include an intercept in the model.

double[]

predict()

Returns the fitted values on the training data.

double[]

predict(double[][] testData)

Returns the predicted values on the test data.

double[]

predict(double[][] testData, double[] testDataWeights)

Returns predicted values on the test data using the given weights.

void

setConfidenceLevel(double confidence)

Sets the confidence level for calculating the confidence limits for the predictions.

void

setIncludeIntercept(boolean includeIntercept)

Sets the flag to include or not include an intercept in the model.

void

setInteractionIndex(int[] xInteractions)

Sets the column indices for pairwise interactions to include in the model.

void

setStartingCoefficients(double[] startCoefs)

Sets the starting values for the coefficients.

void

setTolerance(double tolerance)

Sets the error tolerance criteria for convergence.

Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- LogisticRegression
  
  public LogisticRegression(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
  
  Constructs a logistic regression predictive model.
  
  Parameters:
  
  xy - a double matrix containing the training data
  
  responseColumnIndex - an int, the column index for the response variable
  
  varType - a PredictiveModel.VariableType array containing the type of each variable
- LogisticRegression
  
  public LogisticRegression(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType)
  
  Constructs a logistic regression predictive model.
  
  Parameters:
  
  x - a double matrix containing the training data for the predictor variables
  
  y - a double matrix containing the training data for the response variable
  
  predictorVarType - a PredictiveModel.VariableType array containing the type of each variable
  
  responseVarType - a PredictiveModel.VariableType, the type of the response variable
- LogisticRegression
  
  public LogisticRegression(LogisticRegression lrModel)
  
  Constructs a copy of the input LogisticRegression predictive model.
  
  Parameters:
  
  lrModel - a LogisticRegression predictive model
Method Details
- clone
  
  public LogisticRegression clone()
  
  Clones a LogisticRegression predictive model.
  
  Specified by:
  
  clone in class PredictiveModel
  
  Returns:
  
  a clone of the LogisticRegression predictive model
- fitModel
  
  public void fitModel() throws PredictiveModel.PredictiveModelException
  
  Fits the logistic regression predictive model.
  
  Overrides:
  
  fitModel in class PredictiveModel
  
  Throws:
  
  PredictiveModel.PredictiveModelException
- setConfidenceLevel
  
  public void setConfidenceLevel(double confidence)
  
  Sets the confidence level for calculating the confidence limits for the predictions.
  
  Parameters:
  
  confidence - a double value in the interval $(0,1)$
- getConfidenceLevel
  
  public double getConfidenceLevel()
  
  Returns the confidence level for calculating the confidence limits for the predictions.
  
  Returns:
  
  a double, the confidence level
- predict
  
  public double[] predict() throws PredictiveModel.PredictiveModelException
  
  Returns the fitted values on the training data.
  The fitted value at index i*nClasses+k is the expected number of outcomes in class k for training example (observation) i. If the number of trials (see PredictiveModel.getWeights()) is equal to 1, the fitted value is a probability.
  
  Specified by:
  
  predict in class PredictiveModel
  
  Returns:
  
  a double array of length numberOfObservations*numberOfClasses containing the fitted values
  
  Throws:
  
  PredictiveModel.PredictiveModelException
- predict
  
  public double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
  
  Returns the predicted values on the test data.
  The predicted value at index i*nClasses+k is the expected number of outcomes in class k for test example (observation) i. If the number of trials (see PredictiveModel.getWeights()) is equal to 1, the predicted value is a probability.
  
  Specified by:
  
  predict in class PredictiveModel
  
  Parameters:
  
  testData - double matrix containing data to be predicted. testData must have the same number of columns in the same arrangement as the training data x or xy.
  
  Returns:
  
  a double array containing the predicted values.
  
  Throws:
  
  PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common PredictiveModel methods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend the PredictiveModelException.
- predict
  
  public double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
  
  Returns predicted values on the test data using the given weights.
  The predicted value at index i*nClasses+k is the expected number of outcomes in class k for the test example (observation) i. If the number of trials or frequency provided in testDataWeights is equal to 1, the predicted value is a probability.
  
  Specified by:
  
  predict in class PredictiveModel
  
  Parameters:
  
  testData - a double matrix containing the test data predictors
  
  testDataWeights - a double array containing the test data weights. For the binomial/multinomial response the weights are typically the frequencies, or number of trials.
  
  Returns:
  
  a double array containing predicted outcomes
  
  Throws:
  
  PredictiveModel.PredictiveModelException
- getClassMeans
  
  public double[] getClassMeans()
  
  Returns the class means.
  Run the method fitModel() to populate the class means.
  
  Returns:
  
  a double array containing the class means
- getCoefficients
  
  public double[] getCoefficients()
  
  Returns the current values of the coefficient estimates.
  Run the method fitModel() to populate the coefficient estimates.
  
  Returns:
  
  a double array containing the coefficient estimates
- getGradient
  
  public double[] getGradient()
  
  Returns the gradient.
  Run the method fitModel() to populate the gradient.
  
  Returns:
  
  a double array containing the final gradient
- getConfidenceIntervals
  
  public double[][] getConfidenceIntervals()
  
  Returns the confidence intervals for the predictions.
  Run one of the prediction methods (e.g., predict()) to populate the confidence intervals. The lower limit and upper limit are given for each observation in the training data or test data, according to which prediction method is used.
  
  Returns:
  
  a double matrix containing the lower and upper confidence limits
- getHessian
  
  public double[][] getHessian()
  
  Returns the Hessian matrix.
  Run the method fitModel() to populate the Hessian.
  
  Returns:
  
  a double matrix containing the Hessian matrix
- getInitialLoglikelihood
  
  public double getInitialLoglikelihood()
  
  Returns the log-likelihood of the model evaluated at the starting coefficients.
  This is the log-likelihood at the default starting values coefficients[i]=0 or the starting coefficients set using setStartingCoefficients(double[]). Run the method fitModel() to populate the initial log-likelihood.
  
  Returns:
  
  a double, the initial log-likelihood
- getLoglikelihood
  
  public double getLoglikelihood()
  
  Returns the log-likelihood evaluated at the estimated coefficients.
  Run the method fitModel() to populate the log-likelihood.
  
  Returns:
  
  a double, the log-likelihood of the fitted or trained model
- getLogLikelihoodRatio
  
  public double getLogLikelihoodRatio()
  
  Returns the log-likelihood ratio statistic.
  Run the method fitModel() to populate the likelihood ratio statistic.
  
  Returns:
  
  a double, the value of the log-likelihood ratio statistic
- getStandardErrors
  
  public double[] getStandardErrors()
  
  Returns the standard errors of the estimated coefficients.
  Run the method fitModel() to populate the standard errors.
  
  Returns:
  
  a double array containing the standard errors
- setStartingCoefficients
  
  public void setStartingCoefficients(double[] startCoefs)
  
  Sets the starting values for the coefficients.
  
  Parameters:
  
  startCoefs - a double array containing starting values
- setTolerance
  
  public void setTolerance(double tolerance)
  
  Sets the error tolerance criteria for convergence.
  
  Parameters:
  
  tolerance - a double, the error tolerance criteria for convergence
- setIncludeIntercept
  
  public void setIncludeIntercept(boolean includeIntercept)
  
  Sets the flag to include or not include an intercept in the model.
  The default is true, the model will include an intercept.
  
  Parameters:
  
  includeIntercept - a boolean, indicating whether or not to include an intercept
- setInteractionIndex
  
  public void setInteractionIndex(int[] xInteractions)
  
  Sets the column indices for pairwise interactions to include in the model.
  For each desired interaction pair, i,j, set xIntIndices[2*k]=i and xIntIndices[2*k+1]=j where k=0,1,.. and i,j are column indices contained in PredictiveModel.getPredictorIndexes(). Setting i==j defines a squared term. The default is that no interactions are included.
  
  Parameters:
  
  xInteractions - an int array containing the column indices of desired interactions
- isIncludeIntercept
  
  public boolean isIncludeIntercept()
  
  Returns the current setting of the flag to include or not include an intercept in the model.
  
  Returns:
  
  a boolean, the indicator of whether or not the model includes an intercept
- getXInteractionsIndex
  
  public int[] getXInteractionsIndex()
  
  Returns the array containing the indices specifying interaction terms.
  
  Returns:
  
  an int array containing the index pairs for interaction terms
- getNumberOfCoefficients
  
  public int getNumberOfCoefficients()
  
  Returns the number of coefficients (per class).
  
  Returns:
  
  an int, the number of coefficients

Class LogisticRegression

Nested Class Summary

Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel

Constructor Summary

Method Summary

Methods inherited from class com.imsl.datamining.PredictiveModel

Methods inherited from class java.lang.Object

Constructor Details

LogisticRegression

LogisticRegression

LogisticRegression

Method Details

clone

fitModel

setConfidenceLevel

getConfidenceLevel

predict

predict

predict

getClassMeans

getCoefficients

getGradient

getConfidenceIntervals

getHessian

getInitialLoglikelihood

getLoglikelihood

getLogLikelihoodRatio

getStandardErrors

setStartingCoefficients

setTolerance

setIncludeIntercept

setInteractionIndex

isIncludeIntercept

getXInteractionsIndex

getNumberOfCoefficients