Class LogisticRegression
- All Implemented Interfaces:
Serializable,Cloneable
This function fits a logistic regression model for discrete dependent variables with two or more mutually exclusive outcomes or classes. For a binary response \(y\), the objective is to model the conditional probability of success, \(\pi_1 (x) = Pr[y = 1∣ x]\), where \(x = (x_1, x_2, …, x_p)\) is a realization of \(p\) independent variables. Logistic regression models the conditional probability \(\pi_1 (x)\) using the cdf of the logistic distribution. In particular, $$ \pi_1(x) = \frac{\exp(\eta_1)}{1+\exp(\eta_1)} $$ where the linear predictor $$\eta_1 = \beta_{01} + x_1\beta_{11} + x_2\beta_{21} + \cdots + x_p\beta_{p1} $$ The coefficients \(\{\beta_{01},\beta_{11},\ldots,\beta_{p1}\}\) are unknown and are estimated using the training data.
Solving for the linear component \(\eta_1\) results in the log-odds or logit transformation of \(\pi_1 (x)\): $$ \text{logit}(\pi_1(x)) = \text{log}\frac{\pi_1(x)}{1-\pi_1(x)} = \eta_1 $$ Given a set of \(N\) observations, \(\{y_i, x_i\}\), where \(y_i\) follows a binomial \((n, \pi) \) distribution with parameters \(n = 1\) and \(\pi = \pi_1 (x_i)\), the likelihood and log-likelihood are, respectively, $$ L = \prod_{i=1}^N{{\pi_1(x_i)}^{y_i}{(1-\pi_1(x_i))}^{1-y_i}}$$ $$ \mathcal{L} = \log(L) = \sum_{i=1}^{N} \left[y_i \text{logit}(\pi_1 (x_i))+ \log(1-\pi_1 (x_i))\right]$$ The log-likelihood in terms of the \(\eta_1\) and the unknown parameters \(\{\beta_{01},\beta_1\}\) is therefore $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^{N} \left[y_i \eta_1(x_i) - \log(1+\exp(\eta_1(x_i)))\right]$$ where \(\eta_1(x_i) = \beta_{01} + x_i^{T}\beta_{1}\). With a binary outcome, only one probability needs to be modeled. The second probability can be obtained from the constraint, \(\pi_1(x) + \pi_2(x) = 1\). If each \(y_i\) is the number of successes in \(n_i \gt 1\) independent trials, the log-likelihood becomes $$ \mathcal{L} = \sum_{i=1}^{N} \left[ y_i \text{logit}(\pi_1 (x_i))+ n_i\log(1-\pi_1 (x_i)) \right]$$ or $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^N \left[y_i \eta_1(x_i) - n_i\log(1+\exp(\eta_1(x_i)))\right]$$ The parameters are estimated using the method of maximum likelihood. To test the significance of the model, the log-likelihood of the fitted model is compared to that of an intercept‑only model. In particular, \(G = -2(\mathcal{L}(\beta_{01}) - \mathcal{L}(\beta_{01},\beta_1))\) is a likelihood-ratio test statistic and under the null hypothesis, \(H_0 : \beta_{11} = \beta_{12} = \ldots = \beta_{1p} = 0\), \(G\) is distributed as chi-squared with \(p - 1\) degrees of freedom. A significant result suggests that at least one parameter in the model is non‑zero. See Hosmer and Lemeshow (2000) for further discussion.
In the multinomial case, the response vector is \(y_i = (y_{i1}, y_{i2},\ldots, y_{iK})^T\), where \(y_{ik} = 1\) when the \(i\)-th observation belongs to class \(k\) and \(y_{ik} = 0\), otherwise. Furthermore, because the outcomes are mutually exclusive, \(\sum_{k=1}^{K} y_{ik} = 1\) and \(\pi_1(x) + \pi_2(x) + \cdots + \pi_K(x) = 1\). Without loss of generality the last class \(K\) serves as the baseline or reference class in the sense that it is not modeled directly but found from \(\pi_K(x) = 1 - \sum_{k=1}^{K-1} \pi_k(x) \). Note that if there are multiple trials \(n_i > 1\), the constraint on the responses becomes \(\sum_{k=1}^{K} y_{ik} = n_i\).
The log-likelihood in the multinomial case is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K} = \sum_{i=1}^N \left[ \sum_{l=1}^{K} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K} \exp(\eta_{il})\right ) \right ] $$ where \( \eta_{il} = \beta_{0l} + x_i^T\beta_{l} \).
With \(\pi_l(x_i) = \frac{\exp(\eta_{il})}{\sum_{k=1}^{K}\exp(\eta_{ik})}\), the constraint \(\sum_{l=1}^{K} \pi_l(x_i) = 1 \) is handled by setting \(\eta_{iK}=0\). Then the log-likelihood is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K-1} = \sum_{i=1}^N \left[ \sum_{l=1}^{K-1} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K-1} \exp(\eta_{il})\right ) \right ] $$ Note that the logit for class \(k\) is $$\log(\frac{\pi_{ik}}{\pi_{iK}}) = \eta_{ik} $$ which is the log-odds of class \(k\) versus the reference class \(K\).
Maximum likelihood estimates are found by solving the score equation for each parameter $$\begin{array}{l} \frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}} & = \sum_{i=1}^N \left [x_{ij}y_{il}-\frac{x_{ij}\exp(\eta_{il})}{1+\sum_{k=1}^{K-1}\exp(\eta_{ik})} \right]\\ & = \sum_{i=1}^N x_{ij}(y_{il} - \pi_{il}) \\ & = 0 \end{array} $$ To solve the score equations, the function employs a method known as iteratively re-weighted least squares or IRLS. In this case the IRLS is equivalent to the Newton-Raphson algorithm (Hastie, et. al., 2009, Thisted, 1988).
Consider the full vector of parameters $$ \beta = (\beta_{01},\beta_{11}, \beta_{21},\ldots,\beta_{p1},\beta_{02},\ldots,\beta_{p2},\ldots, \beta_{0K-1},\beta_{1K-1},\ldots,\beta_{pK-1})^T$$ The Newton-Raphson step at iteration \(n+1\) is $$\beta^{n+1} = \beta^n - \matrix{H}^{-1}(\beta^n)\matrix{G}(\beta^n)$$ where \(\matrix{H}\) is the Hessian matrix, the matrix of the second partial derivatives defined by $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{kl}}\partial{\beta_{jl}}} & = -\sum_{i=1}^N \left [ \frac{x_{ij}x_{ik}\exp(\eta_{il})\sum_{m=1}^K \exp(\eta_{im}) - x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{il})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = -\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}(1-\pi_{il}) \end{array} $$ for \(l=1,2,\ldots, K\), \(k,j = 1,2,\ldots,p\), and $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{k\nu}}\partial{\beta_{jl}}} & = -\frac{1}{N}\sum_{i=1}^N \left [ -\frac{x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{i\nu})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = \frac{1}{N}\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}\pi_{i\nu} \end{array} $$ where \(l \ne \nu =1,2,\ldots,K \), and \(k,j = 1,2,\ldots,p\). The \(\matrix{G}\) denotes the the gradient, the vector of the first partial derivatives of the log-likelihood $$\frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}}. $$
Both the gradient and the Hessian are evaluated at the most recent estimate of the parameters, \(\beta^n\). The iteration continues until convergence or until maximum iterations are reached. Following the theory of maximum likelihood estimation (Kendall and Stuart, 1979), standard errors are obtained from Fisher’s information matrix \( \matrix{I}= -\matrix{H}^{-1} \) evaluated at the final estimates.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class com.imsl.datamining.PredictiveModel
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType -
Constructor Summary
ConstructorsConstructorDescriptionLogisticRegression(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType) Constructs a logistic regression predictive model.LogisticRegression(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs a logistic regression predictive model.LogisticRegression(LogisticRegression lrModel) Constructs a copy of the inputLogisticRegressionpredictive model. -
Method Summary
Modifier and TypeMethodDescriptionclone()Clones aLogisticRegressionpredictive model.voidfitModel()Fits the logistic regression predictive model.double[]Returns the class means.double[]Returns the current values of the coefficient estimates.double[][]Returns the confidence intervals for the predictions.doubleReturns the confidence level for calculating the confidence limits for the predictions.double[]Returns the gradient.double[][]Returns the Hessian matrix.doubleReturns the log-likelihood of the model evaluated at the starting coefficients.doubleReturns the log-likelihood evaluated at the estimated coefficients.doubleReturns the log-likelihood ratio statistic.intReturns the number of coefficients (per class).double[]Returns the standard errors of the estimated coefficients.int[]Returns the array containing the indices specifying interaction terms.booleanReturns the current setting of the flag to include or not include an intercept in the model.double[]predict()Returns the fitted values on the training data.double[]predict(double[][] testData) Returns the predicted values on the test data.double[]predict(double[][] testData, double[] testDataWeights) Returns predicted values on the test data using the given weights.voidsetConfidenceLevel(double confidence) Sets the confidence level for calculating the confidence limits for the predictions.voidsetIncludeIntercept(boolean includeIntercept) Sets the flag to include or not include an intercept in the model.voidsetInteractionIndex(int[] xInteractions) Sets the column indices for pairwise interactions to include in the model.voidsetStartingCoefficients(double[] startCoefs) Sets the starting values for the coefficients.voidsetTolerance(double tolerance) Sets the error tolerance criteria for convergence.Methods inherited from class com.imsl.datamining.PredictiveModel
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeights
-
Constructor Details
-
LogisticRegression
public LogisticRegression(double[][] xy, int responseColumnIndex, PredictiveModel.VariableType[] varType) Constructs a logistic regression predictive model.- Parameters:
xy- adoublematrix containing the training dataresponseColumnIndex- anint, the column index for the response variablevarType- aPredictiveModel.VariableTypearray containing the type of each variable
-
LogisticRegression
public LogisticRegression(double[][] x, double[][] y, PredictiveModel.VariableType[] predictorVarType, PredictiveModel.VariableType responseVarType) Constructs a logistic regression predictive model.- Parameters:
x- adoublematrix containing the training data for the predictor variablesy- adoublematrix containing the training data for the response variablepredictorVarType- aPredictiveModel.VariableTypearray containing the type of each variableresponseVarType- aPredictiveModel.VariableType, the type of the response variable
-
LogisticRegression
Constructs a copy of the inputLogisticRegressionpredictive model.- Parameters:
lrModel- aLogisticRegressionpredictive model
-
-
Method Details
-
clone
Clones aLogisticRegressionpredictive model.- Specified by:
clonein classPredictiveModel- Returns:
- a clone of the
LogisticRegressionpredictive model
-
fitModel
Fits the logistic regression predictive model.- Overrides:
fitModelin classPredictiveModel- Throws:
PredictiveModel.PredictiveModelException
-
setConfidenceLevel
public void setConfidenceLevel(double confidence) Sets the confidence level for calculating the confidence limits for the predictions.- Parameters:
confidence- adoublevalue in the interval \((0,1)\)
-
getConfidenceLevel
public double getConfidenceLevel()Returns the confidence level for calculating the confidence limits for the predictions.- Returns:
- a
double, the confidence level
-
predict
Returns the fitted values on the training data.The fitted value at index
i*nClasses+kis the expected number of outcomes in classkfor training example (observation)i. If the number of trials (seePredictiveModel.getWeights()) is equal to 1, the fitted value is a probability.- Specified by:
predictin classPredictiveModel- Returns:
- a
doublearray of lengthnumberOfObservations*numberOfClassescontaining the fitted values - Throws:
PredictiveModel.PredictiveModelException
-
predict
Returns the predicted values on the test data.The predicted value at index
i*nClasses+kis the expected number of outcomes in classkfor test example (observation)i. If the number of trials (seePredictiveModel.getWeights()) is equal to 1, the predicted value is a probability.- Specified by:
predictin classPredictiveModel- Parameters:
testData-doublematrix containing data to be predicted.testDatamust have the same number of columns in the same arrangement as the training dataxorxy.- Returns:
- a
doublearray containing the predicted values. - Throws:
PredictiveModel.PredictiveModelException- is thrown when an exception occurs in the commonPredictiveModelmethods. Implementing or overriding methods from this class may require throwing exceptions. Exceptions thrown from these methods will necessarily extend thePredictiveModelException.
-
predict
public double[] predict(double[][] testData, double[] testDataWeights) throws PredictiveModel.PredictiveModelException Returns predicted values on the test data using the given weights.The predicted value at index
i*nClasses+kis the expected number of outcomes in classkfor the test example (observation)i. If the number of trials or frequency provided intestDataWeightsis equal to 1, the predicted value is a probability.- Specified by:
predictin classPredictiveModel- Parameters:
testData- adoublematrix containing the test data predictorstestDataWeights- adoublearray containing the test data weights. For the binomial/multinomial response the weights are typically the frequencies, or number of trials.- Returns:
- a
doublearray containing predicted outcomes - Throws:
PredictiveModel.PredictiveModelException
-
getClassMeans
public double[] getClassMeans()Returns the class means.Run the method
fitModel()to populate the class means.- Returns:
- a
doublearray containing the class means
-
getCoefficients
public double[] getCoefficients()Returns the current values of the coefficient estimates.Run the method
fitModel()to populate the coefficient estimates.- Returns:
- a
doublearray containing the coefficient estimates
-
getGradient
public double[] getGradient()Returns the gradient.Run the method
fitModel()to populate the gradient.- Returns:
- a
doublearray containing the final gradient
-
getConfidenceIntervals
public double[][] getConfidenceIntervals()Returns the confidence intervals for the predictions.Run one of the prediction methods (e.g.,
predict()) to populate the confidence intervals. The lower limit and upper limit are given for each observation in the training data or test data, according to which prediction method is used.- Returns:
- a
doublematrix containing the lower and upper confidence limits
-
getHessian
public double[][] getHessian()Returns the Hessian matrix.Run the method
fitModel()to populate the Hessian.- Returns:
- a
doublematrix containing the Hessian matrix
-
getInitialLoglikelihood
public double getInitialLoglikelihood()Returns the log-likelihood of the model evaluated at the starting coefficients.This is the log-likelihood at the default starting values
coefficients[i]=0or the starting coefficients set usingsetStartingCoefficients(double[]). Run the methodfitModel()to populate the initial log-likelihood.- Returns:
- a
double, the initial log-likelihood
-
getLoglikelihood
public double getLoglikelihood()Returns the log-likelihood evaluated at the estimated coefficients.Run the method
fitModel()to populate the log-likelihood.- Returns:
- a
double, the log-likelihood of the fitted or trained model
-
getLogLikelihoodRatio
public double getLogLikelihoodRatio()Returns the log-likelihood ratio statistic.Run the method
fitModel()to populate the likelihood ratio statistic.- Returns:
- a
double, the value of the log-likelihood ratio statistic
-
getStandardErrors
public double[] getStandardErrors()Returns the standard errors of the estimated coefficients.Run the method
fitModel()to populate the standard errors.- Returns:
- a
doublearray containing the standard errors
-
setStartingCoefficients
public void setStartingCoefficients(double[] startCoefs) Sets the starting values for the coefficients.- Parameters:
startCoefs- adoublearray containing starting values
-
setTolerance
public void setTolerance(double tolerance) Sets the error tolerance criteria for convergence.- Parameters:
tolerance- adouble, the error tolerance criteria for convergence
-
setIncludeIntercept
public void setIncludeIntercept(boolean includeIntercept) Sets the flag to include or not include an intercept in the model.The default is
true, the model will include an intercept.- Parameters:
includeIntercept- aboolean, indicating whether or not to include an intercept
-
setInteractionIndex
public void setInteractionIndex(int[] xInteractions) Sets the column indices for pairwise interactions to include in the model.For each desired interaction pair,
i,j, setxIntIndices[2*k]=iandxIntIndices[2*k+1]=jwherek=0,1,..andi,jare column indices contained inPredictiveModel.getPredictorIndexes(). Settingi==jdefines a squared term. The default is that no interactions are included.- Parameters:
xInteractions- anintarray containing the column indices of desired interactions
-
isIncludeIntercept
public boolean isIncludeIntercept()Returns the current setting of the flag to include or not include an intercept in the model.- Returns:
- a
boolean, the indicator of whether or not the model includes an intercept
-
getXInteractionsIndex
public int[] getXInteractionsIndex()Returns the array containing the indices specifying interaction terms.- Returns:
- an
intarray containing the index pairs for interaction terms
-
getNumberOfCoefficients
public int getNumberOfCoefficients()Returns the number of coefficients (per class).- Returns:
- an
int, the number of coefficients
-