public class LogisticRegression extends PredictiveModel implements Serializable, Cloneable
This function fits a logistic regression model for discrete dependent variables with two or more mutually exclusive outcomes or classes. For a binary response \(y\), the objective is to model the conditional probability of success, \(\pi_1 (x) = Pr[y = 1∣ x]\), where \(x = (x_1, x_2, …, x_p)\) is a realization of \(p\) independent variables. Logistic regression models the conditional probability \(\pi_1 (x)\) using the cdf of the logistic distribution. In particular, $$ \pi_1(x) = \frac{\exp(\eta_1)}{1+\exp(\eta_1)} $$ where the linear predictor $$\eta_1 = \beta_{01} + x_1\beta_{11} + x_2\beta_{21} + \cdots + x_p\beta_{p1} $$ The coefficients \(\{\beta_{01},\beta_{11},\ldots,\beta_{p1}\}\) are unknown and are estimated using the training data.
Solving for the linear component \(\eta_1\) results in the log-odds or logit transformation of \(\pi_1 (x)\): $$ \text{logit}(\pi_1(x)) = \text{log}\frac{\pi_1(x)}{1-\pi_1(x)} = \eta_1 $$ Given a set of \(N\) observations, \(\{y_i, x_i\}\), where \(y_i\) follows a binomial \((n, \pi) \) distribution with parameters \(n = 1\) and \(\pi = \pi_1 (x_i)\), the likelihood and log-likelihood are, respectively, $$ L = \prod_{i=1}^N{{\pi_1(x_i)}^{y_i}{(1-\pi_1(x_i))}^{1-y_i}}$$ $$ \mathcal{L} = \log(L) = \sum_{i=1}^{N} \left[y_i \text{logit}(\pi_1 (x_i))+ \log(1-\pi_1 (x_i))\right]$$ The log-likelihood in terms of the \(\eta_1\) and the unknown parameters \(\{\beta_{01},\beta_1\}\) is therefore $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^{N} \left[y_i \eta_1(x_i) - \log(1+\exp(\eta_1(x_i)))\right]$$ where \(\eta_1(x_i) = \beta_{01} + x_i^{T}\beta_{1}\). With a binary outcome, only one probability needs to be modeled. The second probability can be obtained from the constraint, \(\pi_1(x) + \pi_2(x) = 1\). If each \(y_i\) is the number of successes in \(n_i \gt 1\) independent trials, the log-likelihood becomes $$ \mathcal{L} = \sum_{i=1}^{N} \left[ y_i \text{logit}(\pi_1 (x_i))+ n_i\log(1-\pi_1 (x_i)) \right]$$ or $$ \mathcal{L}(\beta_{01},\beta_1) = \sum_{i=1}^N \left[y_i \eta_1(x_i) - n_i\log(1+\exp(\eta_1(x_i)))\right]$$ The parameters are estimated using the method of maximum likelihood. To test the significance of the model, the log-likelihood of the fitted model is compared to that of an intercept‑only model. In particular, \(G = -2(\mathcal{L}(\beta_{01}) - \mathcal{L}(\beta_{01},\beta_1))\) is a likelihood-ratio test statistic and under the null hypothesis, \(H_0 : \beta_{11} = \beta_{12} = \ldots = \beta_{1p} = 0\), \(G\) is distributed as chi-squared with \(p - 1\) degrees of freedom. A significant result suggests that at least one parameter in the model is non‑zero. See Hosmer and Lemeshow (2000) for further discussion.
In the multinomial case, the response vector is \(y_i = (y_{i1}, y_{i2},\ldots, y_{iK})^T\), where \(y_{ik} = 1\) when the \(i\)-th observation belongs to class \(k\) and \(y_{ik} = 0\), otherwise. Furthermore, because the outcomes are mutually exclusive, \(\sum_{k=1}^{K} y_{ik} = 1\) and \(\pi_1(x) + \pi_2(x) + \cdots + \pi_K(x) = 1\). Without loss of generality the last class \(K\) serves as the baseline or reference class in the sense that it is not modeled directly but found from \(\pi_K(x) = 1 - \sum_{k=1}^{K-1} \pi_k(x) \). Note that if there are multiple trials \(n_i > 1\), the constraint on the responses becomes \(\sum_{k=1}^{K} y_{ik} = n_i\).
The log-likelihood in the multinomial case is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K} = \sum_{i=1}^N \left[ \sum_{l=1}^{K} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K} \exp(\eta_{il})\right ) \right ] $$ where \( \eta_{il} = \beta_{0l} + x_i^T\beta_{l} \).
With \(\pi_l(x_i) = \frac{\exp(\eta_{il})}{\sum_{k=1}^{K}\exp(\eta_{ik})}\), the constraint \(\sum_{l=1}^{K} \pi_l(x_i) = 1 \) is handled by setting \(\eta_{iK}=0\). Then the log-likelihood is $$\mathcal{L}(\beta_{0l}, \beta_{l})_{l=1}^{K-1} = \sum_{i=1}^N \left[ \sum_{l=1}^{K-1} y_{il}\eta_{il} - \log \left ( \sum_{l=1}^{K-1} \exp(\eta_{il})\right ) \right ] $$ Note that the logit for class \(k\) is $$\log(\frac{\pi_{ik}}{\pi_{iK}}) = \eta_{ik} $$ which is the log-odds of class \(k\) versus the reference class \(K\).
Maximum likelihood estimates are found by solving the score equation for each parameter $$\begin{array}{l} \frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}} & = \sum_{i=1}^N \left [x_{ij}y_{il}-\frac{x_{ij}\exp(\eta_{il})}{1+\sum_{k=1}^{K-1}\exp(\eta_{ik})} \right]\\ & = \sum_{i=1}^N x_{ij}(y_{il} - \pi_{il}) \\ & = 0 \end{array} $$ To solve the score equations, the function employs a method known as iteratively re-weighted least squares or IRLS. In this case the IRLS is equivalent to the Newton-Raphson algorithm (Hastie, et. al., 2009, Thisted, 1988).
Consider the full vector of parameters $$ \beta = (\beta_{01},\beta_{11}, \beta_{21},\ldots,\beta_{p1},\beta_{02},\ldots,\beta_{p2},\ldots, \beta_{0K-1},\beta_{1K-1},\ldots,\beta_{pK-1})^T$$ The Newton-Raphson step at iteration \(n+1\) is $$\beta^{n+1} = \beta^n - \matrix{H}^{-1}(\beta^n)\matrix{G}(\beta^n)$$ where \(\matrix{H}\) is the Hessian matrix, the matrix of the second partial derivatives defined by $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{kl}}\partial{\beta_{jl}}} & = -\sum_{i=1}^N \left [ \frac{x_{ij}x_{ik}\exp(\eta_{il})\sum_{m=1}^K \exp(\eta_{im}) - x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{il})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = -\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}(1-\pi_{il}) \end{array} $$ for \(l=1,2,\ldots, K\), \(k,j = 1,2,\ldots,p\), and $$ \begin{array}{l} \frac{\partial^2{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{k\nu}}\partial{\beta_{jl}}} & = -\frac{1}{N}\sum_{i=1}^N \left [ -\frac{x_{ij}x_{ik}\exp(\eta_{il})\exp(\eta_{i\nu})}{\left ( \sum_{m=1}^K \exp(\eta_{im})\right )^2} \right ]\\ & = \frac{1}{N}\sum_{i=1}^N x_{ij}x_{ik} \pi_{il}\pi_{i\nu} \end{array} $$ where \(l \ne \nu =1,2,\ldots,K \), and \(k,j = 1,2,\ldots,p\). The \(\matrix{G}\) denotes the the gradient, the vector of the first partial derivatives of the log-likelihood $$\frac{\partial{\mathcal{L}(\beta_{0l},\beta_{l})}}{\partial{\beta_{jl}}}. $$
Both the gradient and the Hessian are evaluated at the most recent estimate of the parameters, \(\beta^n\). The iteration continues until convergence or until maximum iterations are reached. Following the theory of maximum likelihood estimation (Kendall and Stuart, 1979), standard errors are obtained from Fisher’s information matrix \( \matrix{I}= -\matrix{H}^{-1} \) evaluated at the final estimates.
PredictiveModel.CloneNotSupportedException, PredictiveModel.PredictiveModelException, PredictiveModel.StateChangeException, PredictiveModel.SumOfProbabilitiesNotOneException, PredictiveModel.VariableType| Constructor and Description |
|---|
LogisticRegression(double[][] x,
double[][] y,
PredictiveModel.VariableType[] predictorVarType,
PredictiveModel.VariableType responseVarType)
Constructs a logistic regression predictive model.
|
LogisticRegression(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
Constructs a logistic regression predictive model.
|
LogisticRegression(LogisticRegression lrModel)
Constructs a copy of the input
LogisticRegression predictive
model. |
| Modifier and Type | Method and Description |
|---|---|
LogisticRegression |
clone()
Clones a
LogisticRegression predictive model. |
void |
fitModel()
Fits the logistic regression predictive model.
|
double[] |
getClassMeans()
Returns the class means.
|
double[] |
getCoefficients()
Returns the current values of the coefficient estimates.
|
double[][] |
getConfidenceIntervals()
Returns the confidence intervals for the predictions.
|
double |
getConfidenceLevel()
Returns the confidence level for calculating the confidence limits for the
predictions.
|
double[] |
getGradient()
Returns the gradient.
|
double[][] |
getHessian()
Returns the Hessian matrix.
|
double |
getInitialLoglikelihood()
Returns the log-likelihood of the model evaluated at the starting
coefficients.
|
double |
getLoglikelihood()
Returns the log-likelihood evaluated at the estimated
coefficients.
|
double |
getLogLikelihoodRatio()
Returns the log-likelihood ratio statistic.
|
int |
getNumberOfCoefficients()
Returns the number of coefficients (per class).
|
double[] |
getStandardErrors()
Returns the standard errors of the estimated coefficients.
|
int[] |
getXInteractionsIndex()
Returns the array containing the indices specifying interaction terms.
|
boolean |
isIncludeIntercept()
Returns the current setting of the flag to include or not include an
intercept in the model.
|
double[] |
predict()
Returns the fitted values on the training data.
|
double[] |
predict(double[][] testData)
Returns the predicted values on the test data.
|
double[] |
predict(double[][] testData,
double[] testDataWeights)
Returns predicted values on the test data using the given weights.
|
void |
setConfidenceLevel(double confidence)
Sets the confidence level for calculating the confidence limits for the
predictions.
|
void |
setIncludeIntercept(boolean includeIntercept)
Sets the flag to include or not include an intercept in the model.
|
void |
setInteractionIndex(int[] xInteractions)
Sets the column indices for pairwise interactions to include in the
model.
|
void |
setStartingCoefficients(double[] startCoefs)
Sets the starting values for the coefficients.
|
void |
setTolerance(double tolerance)
Sets the error tolerance criteria for convergence.
|
getClassCounts, getClassErrors, getClassErrors, getClassLabels, getClassProbabilities, getCostMatrix, getMaxNumberOfCategories, getMaxNumberOfIterations, getNumberOfClasses, getNumberOfColumns, getNumberOfMissing, getNumberOfPredictors, getNumberOfRows, getNumberOfUniquePredictorValues, getPredictorIndexes, getPredictorTypes, getPrintLevel, getPriorProbabilities, getRandomObject, getResponseColumnIndex, getResponseVariableAverage, getResponseVariableMostFrequentClass, getResponseVariableType, getTotalWeight, getVariableType, getWeights, getXY, isConstantSeries, isMustFitModel, isUserFixedNClasses, setClassCounts, setClassLabels, setClassProbabilities, setConfiguration, setCostMatrix, setMaxNumberOfCategories, setMaxNumberOfIterations, setMustFitModel, setNumberOfClasses, setPredictorIndex, setPredictorTypes, setPrintLevel, setPriorProbabilities, setRandomObject, setResponseColumnIndex, setTrainingData, setVariableType, setWeightspublic LogisticRegression(double[][] xy,
int responseColumnIndex,
PredictiveModel.VariableType[] varType)
xy - a double matrix containing the training dataresponseColumnIndex - an int, the column index for the
response variablevarType - a PredictiveModel.VariableType
array containing the type of each variablepublic LogisticRegression(double[][] x,
double[][] y,
PredictiveModel.VariableType[] predictorVarType,
PredictiveModel.VariableType responseVarType)
x - a double matrix containing the training data for
the predictor variablesy - a double matrix containing the training data for
the response variablepredictorVarType - a
PredictiveModel.VariableType array containing
the type of each variableresponseVarType - a
PredictiveModel.VariableType, the type of the
response variablepublic LogisticRegression(LogisticRegression lrModel)
LogisticRegression predictive
model.lrModel - a LogisticRegression predictive modelpublic LogisticRegression clone()
LogisticRegression predictive model.clone in class PredictiveModelLogisticRegression predictive modelpublic void fitModel()
throws PredictiveModel.PredictiveModelException
fitModel in class PredictiveModelPredictiveModel.PredictiveModelExceptionpublic void setConfidenceLevel(double confidence)
confidence - a double value in the interval \((0,1)\)public double getConfidenceLevel()
double, the confidence levelpublic double[] predict()
throws PredictiveModel.PredictiveModelException
The fitted value at index i*nClasses+k is the expected
number of outcomes in class k for training example
(observation) i. If the number of trials (see
PredictiveModel.getWeights()) is equal to 1, the fitted value is a probability.
predict in class PredictiveModeldouble array of length
numberOfObservations*numberOfClasses containing the fitted
valuesPredictiveModel.PredictiveModelExceptionpublic double[] predict(double[][] testData)
throws PredictiveModel.PredictiveModelException
The predicted value at index i*nClasses+k is the expected
number of outcomes in class k for test example (observation)
i. If the number of trials (see PredictiveModel.getWeights()) is
equal to 1, the predicted value is a probability.
predict in class PredictiveModeltestData - double matrix containing data to be
predicted. testData must have the same number of columns in
the same arrangement as the training data x or
xy.double array containing the predicted values.PredictiveModel.PredictiveModelException - is thrown when an exception occurs in
the common PredictiveModel methods. Implementing or
overriding methods from this class may require throwing exceptions.
Exceptions thrown from these methods will necessarily extend the
PredictiveModelException.public double[] predict(double[][] testData,
double[] testDataWeights)
throws PredictiveModel.PredictiveModelException
The predicted value at index i*nClasses+k is the expected
number of outcomes in class k for the test example
(observation) i. If the number of trials or frequency
provided in testDataWeights is equal to 1, the predicted
value is a probability.
predict in class PredictiveModeltestData - a double matrix containing the test data
predictorstestDataWeights - a double array containing the test
data weights. For the binomial/multinomial response the weights are
typically the frequencies, or number of trials.double array containing predicted outcomesPredictiveModel.PredictiveModelExceptionpublic double[] getClassMeans()
Run the method LogisticRegression.fitModel() to populate the class means.
double array containing the class meanspublic double[] getCoefficients()
Run the method LogisticRegression.fitModel() to populate the coefficient estimates.
double array containing the coefficient estimatespublic double[] getGradient()
Run the method LogisticRegression.fitModel() to populate the gradient.
double array containing the final gradientpublic double[][] getConfidenceIntervals()
Run one of the prediction methods (e.g., LogisticRegression.predict()) to populate the
confidence intervals. The lower limit and upper limit are given for each
observation in the training data or test data, according to which
prediction method is used.
double matrix containing the lower and upper
confidence limitspublic double[][] getHessian()
Run the method LogisticRegression.fitModel() to populate the Hessian.
double matrix containing the Hessian matrixpublic double getInitialLoglikelihood()
This is the log-likelihood at the default starting values
coefficients[i]=0 or the starting coefficients set using
LogisticRegression.setStartingCoefficients(double[]). Run the method LogisticRegression.fitModel()
to populate the initial log-likelihood.
double, the initial log-likelihoodpublic double getLoglikelihood()
Run the method LogisticRegression.fitModel() to populate the log-likelihood.
double, the log-likelihood of the fitted or
trained modelpublic double getLogLikelihoodRatio()
Run the method LogisticRegression.fitModel() to populate the likelihood ratio
statistic.
double, the value of the log-likelihood ratio
statisticpublic double[] getStandardErrors()
Run the method LogisticRegression.fitModel() to populate the standard errors.
double array containing the standard errorspublic void setStartingCoefficients(double[] startCoefs)
startCoefs - a double array containing starting valuespublic void setTolerance(double tolerance)
tolerance - a double, the error tolerance criteria for
convergencepublic void setIncludeIntercept(boolean includeIntercept)
The default is true, the model will include an intercept.
includeIntercept - a boolean, indicating whether or not
to include an interceptpublic void setInteractionIndex(int[] xInteractions)
For each desired interaction pair, i,j, set
xIntIndices[2*k]=i and xIntIndices[2*k+1]=j
where k=0,1,.. and i,j are column indices
contained in PredictiveModel.getPredictorIndexes(). Setting
i==j defines a squared term. The default is that no
interactions are included.
xInteractions - an int array containing the column
indices of desired interactionspublic boolean isIncludeIntercept()
boolean, the indicator of whether or not the model
includes an interceptpublic int[] getXInteractionsIndex()
int array containing the index pairs for
interaction termspublic int getNumberOfCoefficients()
int, the number of coefficientsCopyright © 2022 Rogue Wave Software. All rights reserved.