Class LogisticRegression

java.lang.Object
com.imsl.datamining.PredictiveModel
com.imsl.datamining.LogisticRegression
All Implemented Interfaces:
Serializable, Cloneable

public class LogisticRegression extends PredictiveModel implements Serializable, Cloneable
Performs binomial or multinomial logistic regression.

This function fits a logistic regression model for discrete dependent variables with two or more mutually exclusive outcomes or classes. For a binary response \(y\), the objective is to model the conditional probability of success, \(\pi_1 (x) = Pr[y = 1∣ x]\), where \(x = (x_1, x_2, …, x_p)\) is a realization of \(p\) independent variables. Logistic regression models the conditional probability \(\pi_1 (x)\) using the cdf of the logistic distribution. In particular, $$ \pi_1(x) = \frac{\exp(\eta_1)}{1+\exp(\eta_1)} $$ where the linear predictor $$\eta_1 = \beta_{01} + x_1\beta_{11} + x_2\beta_{21} + \cdots + x_p\beta_{p1} $$ The coefficients \(\{\beta_{01},\beta_{11},\ldots,\beta_{p1}\}\) are unknown and are estimated using the training data.

Solving for the linear component \(\eta_1\) results in the log-odds or logit transformation of \(\pi_1 (x)\): $$ \text{logit}(\pi_1(x)) = \text{log}\frac{\pi_1(x)}{1-\pi_1(x)} = \eta_1 $$ Given a set of \(N\) observations, \(\{y_i, x_i\}\), where \(y_i\) follows a binomial \((n, \pi) \) distribution with parameters \(n = 1\) and \(\pi = \pi_1 (x_i)\), the likelihood and log-likelihood are, respectively, $$ L = \prod_{i=1}^N{{\pi_1(x_i)}^{y_i}{(1-\pi_1(x_i))}^{1-y_i}}$$ $$ \mathcal{L} = \log(L) = \sum_{i=1}^{N} \left[y_i \text{logit}(\pi_1 (x_i))+ \log(1-\pi_1 (x_i))\right]$$ The log-likelihood in terms of the \(\eta_1\) and the unknown parameters \(\{\beta_{01},\beta_1\}\) is therefore $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^{N} \left[y_i \eta_1(x_i) - \log(1+\exp(\eta_1(x_i)))\right]$$ where \(\eta_1(x_i) = \beta_{01} + x_i^{T}\beta_{1}\). With a binary outcome, only one probability needs to be modeled. The second probability can be obtained from the constraint, \(\pi_1(x) + \pi_2(x) = 1\). If each \(y_i\) is the number of successes in \(n_i \gt 1\) independent trials, the log-likelihood becomes $$ \mathcal{L} = \sum_{i=1}^{N} \left[ y_i \text{logit}(\pi_1 (x_i))+ n_i\log(1-\pi_1 (x_i)) \right]$$ or $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^N \left[y_i \eta_1(x_i) - n_i\log(1+\exp(\eta_1(x_i)))\right]$$ The parameters are estimated using the method of maximum likelihood. To test the significance of the model, the log-likelihood of the fitted model is compared to that of an intercept‑only model. In particular, \(G = -2(\mathcal{L}(\beta_{01}) - \mathcal{L}(\beta_{01},\beta_1))\) is a likelihood-ratio test statistic and under the null hypothesis, \(H_0 : \beta_{11} = \beta_{12} = \ldots = \beta_{1p} = 0\), \(G\) is distributed as chi-squared with \(p - 1\) degrees of freedom. A significant result suggests that at least one parameter in the model is non‑zero. See Hosmer and Lemeshow (2000) for further discussion.

In the multinomial case, the response vector is \(y_i = (y_{i1}, y_{i2},\ldots, y_{iK})^T\), where \(y_{ik} = 1\) when the \(i\)-th observation belongs to class \(k\) and \(y_{ik} = 0\), otherwise. Furthermore, because the outcomes are mutually exclusive, \(\sum_{k=1}^{K} y_{ik} = 1\) and \(\pi_1(x) + \pi_2(x) + \cdots + \pi_K(x) = 1\). Without loss of generality the last class \(K\) serves as the baseline or reference class in the sense that it is not modeled directly but found from \(\pi_K(x) = 1 - \sum_{k=1}^{K-1} \pi_k(x) \). Note that if there are multiple trials \(n_i > 1\), the constraint on the responses becomes \(\sum_{k=1}^{K} y_{ik} = n_i\).

The log-likelihood in the multinomial case is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K} = \sum_{i=1}^N \left[ \sum_{l=1}^{K} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K} \exp(\eta_{il})\right ) \right ] $$ where \( \eta_{il} = \beta_{0l} + x_i^T\beta_{l} \).

With \(\pi_l(x_i) = \frac{\exp(\eta_{il})}{\sum_{k=1}^{K}\exp(\eta_{ik})}\), the constraint \(\sum_{l=1}^{K} \pi_l(x_i) = 1 \) is handled by setting \(\eta_{iK}=0\). Then the log-likelihood is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K-1} = \sum_{i=1}^N \left[ \sum_{l=1}^{K-1} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K-1} \exp(\eta_{il})\right ) \right ] $$ Note that the logit for class \(k\) is $$\log(\frac{\pi_{ik}}{\pi_{iK}}) = \eta_{ik} $$ which is the log-odds of class \(k\) versus the reference class \(K\).

Maximum likelihood estimates are found by solving the score equation for each parameter $$\begin{array}{l} \frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}} & = \sum_{i=1}^N \left [x_{ij}y_{il}-\frac{x_{ij}\exp(\eta_{il})}{1+\sum_{k=1}^{K-1}\exp(\eta_{ik})} \right]\\ & = \sum_{i=1}^N x_{ij}(y_{il} - \pi_{il}) \\ & = 0 \end{array} $$ To solve the score equations, the function employs a method known as iteratively re-weighted least squares or IRLS. In this case the IRLS is equivalent to the Newton-Raphson algorithm (Hastie, et. al., 2009, Thisted, 1988).

Consider the full vector of parameters $$ \beta = (\beta_{01},\beta_{11}, \beta_{21},\ldots,\beta_{p1},\beta_{02},\ldots,\beta_{p2},\ldots, \beta_{0K-1},\beta_{1K-1},\ldots,\beta_{pK-1})^T$$ The Newton-Raphson step at iteration \(n+1\) is $$\beta^{n+1} = \beta^n - \matrix{H}^{-1}(\beta^n)\matrix{G}(\beta^n)$$ where \(\matrix{H}\) is the Hessian matrix, the matrix of the second partial derivatives defined by $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{kl}}\partial{\beta_{jl}}} & = -\sum_{i=1}^N \left [ \frac{x_{ij}x_{ik}\exp(\eta_{il})\sum_{m=1}^K \exp(\eta_{im}) - x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{il})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = -\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}(1-\pi_{il}) \end{array} $$ for \(l=1,2,\ldots, K\), \(k,j = 1,2,\ldots,p\), and $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{k\nu}}\partial{\beta_{jl}}} & = -\frac{1}{N}\sum_{i=1}^N \left [ -\frac{x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{i\nu})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = \frac{1}{N}\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}\pi_{i\nu} \end{array} $$ where \(l \ne \nu =1,2,\ldots,K \), and \(k,j = 1,2,\ldots,p\). The \(\matrix{G}\) denotes the the gradient, the vector of the first partial derivatives of the log-likelihood $$\frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}}. $$

Both the gradient and the Hessian are evaluated at the most recent estimate of the parameters, \(\beta^n\). The iteration continues until convergence or until maximum iterations are reached. Following the theory of maximum likelihood estimation (Kendall and Stuart, 1979), standard errors are obtained from Fisher’s information matrix \( \matrix{I}= -\matrix{H}^{-1} \) evaluated at the final estimates.

See Also:
  • Constructor Details

    • LogisticRegression

      public LogisticRegression(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType)
      Constructs a logistic regression predictive model.
      Parameters:
      xy - a double matrix containing the training data
      responseColumnIndex - an int, the column index for the response variable
      varType - a PredictiveModel.VariableType array containing the type of each variable
    • LogisticRegression

      public LogisticRegression(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType)
      Constructs a logistic regression predictive model.
      Parameters:
      x - a double matrix containing the training data for the predictor variables
      y - a double matrix containing the training data for the response variable
      predictorVarType - a PredictiveModel.VariableType array containing the type of each variable
      responseVarType - a PredictiveModel.VariableType, the type of the response variable
    • LogisticRegression

      public LogisticRegression(LogisticRegression lrModel)
      Constructs a copy of the input LogisticRegression predictive model.
      Parameters:
      lrModel - a LogisticRegression predictive model
  • Method Details

    • clone

      public LogisticRegression clone()
      Clones a LogisticRegression predictive model.
      Specified by:
      clone in class PredictiveModel
      Returns:
      a clone of the LogisticRegression predictive model
    • fitModel

      public void fitModel() throws PredictiveModel.PredictiveModelException
      Fits the logistic regression predictive model.
      Overrides:
      fitModel in class PredictiveModel
      Throws:
      PredictiveModel.PredictiveModelException
    • setConfidenceLevel

      public void setConfidenceLevel(double confidence)
      Sets the confidence level for calculating the confidence limits for the predictions.
      Parameters:
      confidence - a double value in the interval \((0,1)\)
    • getConfidenceLevel

      public double getConfidenceLevel()
      Returns the confidence level for calculating the confidence limits for the predictions.
      Returns:
      a double, the confidence level
    • predict

      public double[] predict() throws PredictiveModel.PredictiveModelException
      Returns the fitted values on the training data.

      The fitted value at index i*nClasses+k is the expected number of outcomes in class k for training example (observation) i. If the number of trials (see PredictiveModel.getWeights()) is equal to 1, the fitted value is a probability.

      Specified by:
      predict in class PredictiveModel
      Returns:
      a double array of length numberOfObservations*numberOfClasses containing the fitted values
      Throws:
      PredictiveModel.PredictiveModelException
    • predict

      public double[] predict(double[][] testData) throws PredictiveModel.PredictiveModelException
      Returns the predicted values on the test data.

      The predicted value at index i*nClasses+k is the expected number of outcomes in class k for test example (observation) i. If the number of trials (see PredictiveModel.getWeights()) is equal to 1, the predicted value is a probability.

      Specified by:
      predict in class PredictiveModel
      Parameters:
      testData - double matrix containing data to be predicted. testData must have the same number of columns in the same arrangement as the training data x or xy.
      Returns:
      a double array containing the predicted values.
      Throws:
      PredictiveModel.PredictiveModelException - is thrown when an exception occurs in the common PredictiveModel methods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend the PredictiveModelException.
    • predict

      public double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException
      Returns predicted values on the test data using the given weights.

      The predicted value at index i*nClasses+k is the expected number of outcomes in class k for the test example (observation) i. If the number of trials or frequency provided in testDataWeights is equal to 1, the predicted value is a probability.

      Specified by:
      predict in class PredictiveModel
      Parameters:
      testData - a double matrix containing the test data predictors
      testDataWeights - a double array containing the test data weights. For the binomial/multinomial response the weights are typically the frequencies, or number of trials.
      Returns:
      a double array containing predicted outcomes
      Throws:
      PredictiveModel.PredictiveModelException
    • getClassMeans

      public double[] getClassMeans()
      Returns the class means.

      Run the method fitModel() to populate the class means.

      Returns:
      a double array containing the class means
    • getCoefficients

      public double[] getCoefficients()
      Returns the current values of the coefficient estimates.

      Run the method fitModel() to populate the coefficient estimates.

      Returns:
      a double array containing the coefficient estimates
    • getGradient

      public double[] getGradient()
      Returns the gradient.

      Run the method fitModel() to populate the gradient.

      Returns:
      a double array containing the final gradient
    • getConfidenceIntervals

      public double[][] getConfidenceIntervals()
      Returns the confidence intervals for the predictions.

      Run one of the prediction methods (e.g., predict()) to populate the confidence intervals. The lower limit and upper limit are given for each observation in the training data or test data, according to which prediction method is used.

      Returns:
      a double matrix containing the lower and upper confidence limits
    • getHessian

      public double[][] getHessian()
      Returns the Hessian matrix.

      Run the method fitModel() to populate the Hessian.

      Returns:
      a double matrix containing the Hessian matrix
    • getInitialLoglikelihood

      public double getInitialLoglikelihood()
      Returns the log-likelihood of the model evaluated at the starting coefficients.

      This is the log-likelihood at the default starting values coefficients[i]=0 or the starting coefficients set using setStartingCoefficients(double[]). Run the method fitModel() to populate the initial log-likelihood.

      Returns:
      a double, the initial log-likelihood
    • getLoglikelihood

      public double getLoglikelihood()
      Returns the log-likelihood evaluated at the estimated coefficients.

      Run the method fitModel() to populate the log-likelihood.

      Returns:
      a double, the log-likelihood of the fitted or trained model
    • getLogLikelihoodRatio

      public double getLogLikelihoodRatio()
      Returns the log-likelihood ratio statistic.

      Run the method fitModel() to populate the likelihood ratio statistic.

      Returns:
      a double, the value of the log-likelihood ratio statistic
    • getStandardErrors

      public double[] getStandardErrors()
      Returns the standard errors of the estimated coefficients.

      Run the method fitModel() to populate the standard errors.

      Returns:
      a double array containing the standard errors
    • setStartingCoefficients

      public void setStartingCoefficients(double[] startCoefs)
      Sets the starting values for the coefficients.
      Parameters:
      startCoefs - a double array containing starting values
    • setTolerance

      public void setTolerance(double tolerance)
      Sets the error tolerance criteria for convergence.
      Parameters:
      tolerance - a double, the error tolerance criteria for convergence
    • setIncludeIntercept

      public void setIncludeIntercept(boolean includeIntercept)
      Sets the flag to include or not include an intercept in the model.

      The default is true, the model will include an intercept.

      Parameters:
      includeIntercept - a boolean, indicating whether or not to include an intercept
    • setInteractionIndex

      public void setInteractionIndex(int[] xInteractions)
      Sets the column indices for pairwise interactions to include in the model.

      For each desired interaction pair, i,j, set xIntIndices[2*k]=i and xIntIndices[2*k+1]=j where k=0,1,.. and i,j are column indices contained in PredictiveModel.getPredictorIndexes(). Setting i==j defines a squared term. The default is that no interactions are included.

      Parameters:
      xInteractions - an int array containing the column indices of desired interactions
    • isIncludeIntercept

      public boolean isIncludeIntercept()
      Returns the current setting of the flag to include or not include an intercept in the model.
      Returns:
      a boolean, the indicator of whether or not the model includes an intercept
    • getXInteractionsIndex

      public int[] getXInteractionsIndex()
      Returns the array containing the indices specifying interaction terms.
      Returns:
      an int array containing the index pairs for interaction terms
    • getNumberOfCoefficients

      public int getNumberOfCoefficients()
      Returns the number of coefficients (per class).
      Returns:
      an int, the number of coefficients