imsl.regression.LogisticRegression

class LogisticRegression(n_predictors, n_classes, ref_class=None, intercept=True, x_interact=None)

Generate a logistic regression model.

Generate a binomial or multinomial logistic regression model using iteratively re-weighted least squares.

Parameters:
  • n_predictors (int) – The number of predictors in the model.
  • n_classes (int) – The number of discrete outcomes, or classes, in the model.
  • ref_class (int, optional) –

    A number specifying which class or outcome category to use as the reference class. Outcome categories are assumed to be numbered 1,…, n_classes.

    Default: ref_class = n_classes.

  • intercept (bool, optional) –

    Specifies if the model will include an intercept (constant).

    Default: intercept = True.

  • x_interact ((n_x_interact, 2) array_like, optional) –

    Array providing pairs of column indices of the predictor variables that define the interaction terms in the model. Adjacent indices should be unique. For example, suppose there are two independent variables x0 and x1. To fit a model that includes their interaction term x0x1, set x_interact = [[0, 1]]

    Default: No interaction terms are included.

Notes

Class LogisticRegression fits a logistic regression model for discrete dependent variables with two or more mutually exclusive outcomes or classes.

1. Binary Response Classes

For a binary response y, the objective is to model the conditional probability of success, \(\pi_1(x)=\Pr(y=1|x)\), where \(x=(x_1,x_2,\ldots,x_p)^T\) is a realization of p independent variables (predictors). Logistic regression models the conditional probability, \(\pi_1(x)\), using the cdf of the logistic distribution. In particular,

\[\pi_1(x) = \frac{\exp(\eta_1)}{1+\exp(\eta_1)} \equiv \frac{1}{1+\exp(-\eta_1)}\;,\]

where

\[\eta_1 = \beta_{10}+x^T\beta_1\]

and

\[\beta_{10},\; \beta_1=(\beta_{11}, \beta_{12},\ldots,\beta_{1p})^T\]

are unknown coefficients that are to be estimated.

Solving for the linear component \(\eta_1\) results in the log-odds or logit transformation of \(\pi_1(x)\):

\[\text{logit}(\pi_1(x)) = \log \frac{\pi_1(x)}{1-\pi_1(x)} = \eta_1\]

Given a set of N observations \((y_i, x_i)\), where \(y_i\) follows a binomial \((n, \pi)\) distribution with parameters n = 1 and \(\pi = \pi_1(x_i)\), the likelihood and log-likelihood are, respectively,

\[L = \prod_{i=1}^N \pi(x_i)^{y_i}(1-\pi(x_i))^{1-y_i}\]
\[l = \sum_{i=1}^N \left\{ y_i \log\left(\frac{\pi(x_i)}{1-\pi(x_i)} \right)+\log(1-\pi(x_i)) \right\}\]

The log-likelihood in terms of the parameters \({\beta_{10}, \beta_1}\) is therefore

\[l(\beta_{10},\beta_1) = \sum_{i=1}^N \left\{ y_i\eta_{i1}- \log(1+\exp(\eta_{i1})) \right\}\]

where

\[\eta_{i1} = \beta_{10}+x_i^T\beta_1\]

With a binary outcome, only one probability needs to be modeled. The second probability can be obtained from the constraint \(\pi_1(x)+\pi_2(x)=1\). If each \(y_i\) is the number of successes in \(n_i\) independent trials, the log-likelihood becomes

\[l = \sum_{i=1}^N \left\{ y_i \log\left(\frac{\pi(x_i)}{1-\pi(x_i)} \right)+n_i \log(1-\pi(x_i)) \right\}\]

or

\[l(\beta_{10},\beta_1) = \sum_{i=1}^N \left\{ y_i\eta_{i1}- n_i \log(1+\exp(\eta_{i1})) \right\}\]

To test the significance of the model, the log-likelihood of the fitted model is compared to that of an intercept-only model. In particular, \(G = -2(l(\beta_{10},0)-l(\beta_{10},\beta_1))\) is a likelihood-ratio test statistic, and under the null hypothesis \(H_0:\beta_{11}=\beta_{12}=\ldots=\beta_{1p}=0\), G is distributed as chi-squared with p-1 degrees of freedom. A significant result suggests that at least one parameter in the model is non-zero. See [2] for further discussion.

2. More than 2 Response Classes

In the multinomial case, the response vector is \(y_i=(y_{i1}, y_{i2},\ldots,y_{iK})^T\), where \(y_{ik}=1\) when the i -th observation belongs to class k and \(y_{ik}=0\), otherwise. Furthermore, because the outcomes are mutually exclusive,

\[\sum_{k=1}^Ky_{ik} = 1 \,,\]

and

\[\pi_1(x)+\pi_2(x)+\ldots+\pi_K(x) = 1.\]

The last class K serves as the baseline or reference class in the sense that it is not modeled directly but found from

\[\pi_K(x) = 1 - \sum_{k=1}^{K-1}\pi_k(x) \,.\]

If there are multiple trials, i.e. \(n_i>1\), then the constraint on the responses is

\[\sum_{k=1}^K y_{ik} = n_i \,.\]

Define

\[\beta_{m0},\; \beta_m:=(\beta_{m1},\ldots,\beta_{mp})^T\]

as the regression coefficients of response class m, m=1,…,K.

For i=1,…,N and m=1,…,K, set

\[\eta_{im} = \beta_{m0} + x_i^T \beta_m\]

and

\[\pi_m(x_i) := \pi_{im} = \frac{\exp(\eta_{im})}{1+\sum_{k=1}^{K-1} \exp(\eta_{ik})}\]

The log-likelihood in the multinomial case becomes

\[l(\beta_{10},\beta_1,\ldots,\beta_{K0},\beta_K) = \sum_{i=1}^N\left\{\sum_{l=1}^Ky_{il}\eta_{il}- n_i\log\left(\sum_{j=1}^K\exp(\eta_{ij})\right)\right\}\,.\]

The constraint

\[\sum_{k=1}^K\pi_{ik} = 1\]

is handled by setting \(\eta_{iK}=0\) for the K-th class, reducing the full vector of regression coefficients to

\[\beta := (\beta_{10},\beta_1,\beta_{20},\beta_2,\ldots,\beta_{K-1,0}, \beta_{K-1})^T\,.\]

Then, the log-likelihood is

\[l(\beta) = \sum_{i=1}^N\left\{\sum_{l=1}^{K-1}y_{il}\eta_{il}-n_i\log\left(1+ \sum_{j=1}^{K-1} \exp(\eta_{ij})\right)\right\}\]

or

\[l(\beta) = \sum_{i=1}^N\left\{\sum_{l=1}^{K-1}y_{il}(\beta_{l0}+x_i^T\beta_l) -n_i\log\left(1+\sum_{j=1}^{K-1}\exp(\beta_{j0}+x_i^T\beta_j)\right) \right\}\,.\]

Note that for the multinomial case, the log-odds (or logit) is

\[\log \frac{\pi_{il}}{\pi_{iK}} = \beta_{l0}+x_i^T\beta_l, \; l=1,\ldots,K-1\,.\]

Each of the logits involves the odds ratio of being in class l versus class K, the reference class.

3. Maximum-Likelihood Estimation

Maximum likelihood estimates can be obtained by solving the score equation for each parameter:

\[\begin{split}\begin{array}{rcl} \frac{\partial l(\beta)}{\partial \beta_{mj}}&=& \sum_{i=1}^N\left\{x_{ij}y_{im}-n_i \frac{x_{ij}\exp(\eta_{im})}{1+\sum_{k=1}^{K-1}\exp(\eta_{ik})} \right\}\\ &=& \sum_{i=1}^N \left\{x_{ij}\left(y_{im}-n_i\pi_{im}\right) \right\}\\ &\stackrel{!}{=}&0\,. \end{array}\end{split}\]

Here, \(x_{ij}\) denotes the j-th component of observation \(x_i\), and \(x_{i0}:=1\).

To solve the score equations, the class employs a method known as iteratively re-weighted least squares or IRLS. In this case, the IRLS is equivalent to the Newton-Raphson algorithm ([1], [5]).

The Newton-Raphson iteration is

\[\beta^{n+1} = \beta^n - \mathbf{H}^{-1}(\beta^n) \mathbf{G}(\beta^n)\,,\]

where \(\mathbf{H}\) denotes the Hessian of the log-likelihood, i.e., the matrix of second partial derivatives defined by

\[\begin{split}\frac{\partial^2 l(\beta)}{\partial \beta_{\nu k}\partial \beta_{mj}}= -\sum_{i=1}^Nn_ix_{ij} \frac{\partial \pi_{im}}{\partial \beta_{\nu k}}= \left\{ \begin{array}{ll} -\sum_{i=1}^Nn_ix_{ij}x_{ik}\pi_{im}(1-\pi_{im})\,, & \nu =m\\ \sum_{i=1}^N n_i x_{ij}x_{ik}\pi_{im}\pi_{i\nu}\,, & \nu \ne m \end{array} \right.\end{split}\]

and \(\mathbf{G}\) denotes the gradient of the log-likelihood, the vector of first partial derivatives,

\[\frac{\partial l(\beta)}{\partial \beta_{mj}}\,.\]

Both the gradient and the Hessian are evaluated at the most recent estimate of the parameters, \(\beta^n\). The iteration continues until convergence or until maximum iterations are reached. Following the theory of likelihood estimation ([3]), standard errors are obtained from Fisher’s information matrix \(-\mathbf{H}^{-1}\) evaluated at the final estimates.

4. Model Aggregation

When methods fit (with optional argument update) or aggregate are called, estimates of the same model from separate fits are combined using the method presented in [6]. To illustrate, let \(\beta_1\) and \(\beta_2\) be the MLEs from separate fits to two different sets of data, and let \(\mathbf{H_1}\) and \(\mathbf{H_2}\) be the associated Hessian matrices. Then, the combined estimate

\[\beta=(\mathbf{H_1} + \mathbf{H_2})^{-1} (\mathbf{H_1}\beta_1 + \mathbf{H_2}\beta_2)\]

approximates the MLE of the combined data set. The model structure contains the combined estimates as well as other elements; see the attributes of the LogisticRegression class.

References

[1]Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, New York.
[2]Hosmer, David W., Stanley Lemeshow, and Rodney X. Sturdivant (2013), Applied Logistic Regression, Third Edition, John Wiley & Sons, New Jersey.
[3]Kendall, Maurice G., and Alan Stuart (1979), The Advanced Theory of Statistics, Volume 2: Inference and Relationship, 4th ed., Oxford University Press, New York.
[4]Prentice, Ross L. (1976), A generalization of the probit and logit methods for dose response curves, Biometrics, 32, 761-768.
[5]Thisted, Ronald. A. (1988), Elements of Statistical Computing: Numerical Computation, Chapman & Hall, New York.
[6]Xi, Ruibin, Nan Lin, and Yixin Chen (2008), Compression and Aggregation for Logistic Regression Analysis in Data Cubes, IEEE Transactions on Knowledge and Data Engineering, Vol. 1, No. 1.

Examples

Example 1:

The first example is from [4] and involves the mortality of beetles after five hours exposure to eight different concentrations of carbon disulphide. The table below lists the number of beetles exposed (N) to each concentration level of carbon disulphide (x1, given as log dosage) and the number of deaths which result (y1):

Log Dosage Number of Beetles Exposed Number of Deaths
1.690 59 6
1.724 60 13
1.755 62 18
1.784 56 28
1.811 63 52
1.836 59 53
1.861 62 61
1.883 60 60

The number of deaths at each concentration level is the binomial response (n_classes = 2) and the log-dosage is the single predictor variable. Note that this example illustrates the use of class ResponseFormatRef to generate the matrix of response class counts. The reference class is defined as the class of beetles that survive a certain log dosage.

>>> import numpy as np
>>> import imsl.regression as reg
>>> y1 = np.array([6.0, 13.0, 18.0, 28.0, 52.0, 53.0, 61.0, 60.0])
>>> x1 = np.array([1.69, 1.724, 1.755, 1.784, 1.811, 1.836, 1.861, 1.883])
>>> freqs = np.array([59.0, 60.0, 62.0, 56.0, 63.0, 59.0, 62.0, 60.0])
>>> n_predictors = 1
>>> n_classes = 2
>>> response_counts = reg.ResponseFormatRef(y1, frequencies=freqs)
>>> model = reg.LogisticRegression(n_predictors, n_classes)
>>> model.fit(x1, response_counts)
>>> np.set_printoptions(precision=2)
>>> print("Coefficient estimates:\n" +
...       str(model.coefficients)) 
Coefficient estimates:
[[-60.76  34.3 ]
 [  0.     0.  ]]

Example 2:

In the second example, the response is a multinomial random variable with four outcome classes. The three predictor variables represent two categorical variables and one continuous variable. A subset of two predictor variables along with the intercept defines the logistic regression model. For each observation, the ID of the outcome class is defined in array y. A test of significance is performed.

>>> import numpy as np
>>> import imsl.regression as reg
>>> from scipy.stats import chi2
>>> # Array of predictors
>>> x = np.array([[3, 25.92869, 1], [2,51.63245, 2], [2, 25.78432, 1],
...               [1, 39.37948, 1], [3,24.65058, 1], [3, 45.20084, 1],
...               [3, 52.6796, 2], [2, 44.28342, 2], [3, 40.63523, 2],
...               [3, 51.76094, 1], [3, 26.30368, 1], [3, 20.70230, 2],
...               [3, 38.74273, 2], [3,19.47333, 1], [2, 26.42211, 1],
...               [3, 37.05986, 2], [2, 51.67043, 2], [1, 42.40156, 1],
...               [3, 33.90027, 2], [2, 35.43282, 1], [2, 44.30369, 1],
...               [1, 46.72387, 1], [2, 46.99262, 1], [1, 36.05923, 1],
...               [3, 36.83197, 2], [2, 61.66257, 2], [1, 25.67714, 1],
...               [2, 39.08567, 2], [1, 48.84341, 2], [2, 39.34391, 1],
...               [3, 24.73522, 1], [2, 50.55251, 2], [1, 31.34263, 2],
...               [2, 27.15795, 2], [1, 31.72685, 1], [1, 25.00408, 1],
...               [2, 26.35457, 2], [3, 38.12343, 1], [1, 49.9403, 1],
...               [2, 42.45779, 2], [1, 38.80948, 2], [1, 43.22799, 2],
...               [1, 41.87624, 1], [3, 48.0782, 1], [1, 43.23673, 2],
...               [3, 39.41294, 1], [2, 23.93346, 1], [3, 42.8413, 2],
...               [3, 30.40669, 1], [1, 37.77389, 1]])
>>> x1 = np.asarray(x[:, 0:2])
>>> # Array of response IDs
>>> y = np.array([1, 2, 3, 4, 3, 3, 4, 4, 4, 4, 2, 1, 4, 1, 1, 1,
... 4, 4, 3, 1, 2, 3, 3, 4, 2, 3, 4, 1, 2, 4, 3, 4, 4, 1, 3, 4, 4,
... 2, 3, 4, 2, 2, 4, 3, 1, 4, 3, 4, 2, 3])
>>> n_predictors = 2
>>> n_classes = 4
>>> response_counts = reg.ResponseFormatID(y, n_classes)
>>> model = reg.LogisticRegression(n_predictors, n_classes)
>>> model.fit(x1, response_counts)
>>> n_coefs = model.n_coeffs
>>> lrstat = model.likeli_ratio_test_stat
>>> dof = n_coefs * (n_classes - 1) - (n_classes - 1)
>>> model_pval = 1.0 - chi2.cdf(lrstat, dof)
>>> np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
>>> print("Coefficients:\n" +
...       str(model.coefficients)) 
Coefficients:
[[ 2.292  0.408 -0.111]
 [-1.162  0.245 -0.002]
 [-0.067  0.178 -0.017]
 [ 0.000  0.000  0.000]]
>>> print("Standard Errors:\n" +
...       str(model.stderrs)) 
Standard Errors:
[[ 2.259  0.548  0.051]
 [ 2.122  0.500  0.044]
 [ 1.862  0.442  0.039]]
>>> print("\nLog-likelihood: {0:5.2f}".
...       format(model.log_likeli)) 
Log-likelihood: -62.92
>>> print("\nLR test statistic: {0:5.2f}".
...       format(lrstat)) 
LR test statistic:  7.68
>>> print("{0:2d} deg. freedom, p-value: {1:5.4f}".format(
...       dof, model_pval)) 
6 deg. freedom, p-value: 0.2623
>>> # Put back the default options
>>> np.set_printoptions()

Example 3:

The third example uses the same data as in the second example and an additional set of 50 observations using the same data generating process. The model structure includes all three predictor variables and an intercept, and a single model fit is approximated from two separate model fits. Example 3 also includes a fit on the full data set for comparison purposes.

>>> import numpy as np
>>> import imsl.regression as reg
>>> # Array 1 of predictors
>>> x1 = np.array([[3, 25.92869, 1], [2,51.63245, 2], [2, 25.78432, 1],
...                [1, 39.37948, 1], [3,24.65058, 1], [3, 45.20084, 1],
...                [3, 52.6796, 2], [2, 44.28342, 2], [3, 40.63523, 2],
...                [3, 51.76094, 1], [3, 26.30368, 1], [3, 20.70230, 2],
...                [3, 38.74273, 2], [3,19.47333, 1], [2, 26.42211, 1],
...                [3, 37.05986, 2], [2, 51.67043, 2], [1, 42.40156, 1],
...                [3, 33.90027, 2], [2, 35.43282, 1], [2, 44.30369, 1],
...                [1, 46.72387, 1], [2, 46.99262, 1], [1, 36.05923, 1],
...                [3, 36.83197, 2], [2, 61.66257, 2], [1, 25.67714, 1],
...                [2, 39.08567, 2], [1, 48.84341, 2], [2, 39.34391, 1],
...                [3, 24.73522, 1], [2, 50.55251, 2], [1, 31.34263, 2],
...                [2, 27.15795, 2], [1, 31.72685, 1], [1, 25.00408, 1],
...                [2, 26.35457, 2], [3, 38.12343, 1], [1, 49.9403, 1],
...                [2, 42.45779, 2], [1, 38.80948, 2], [1, 43.22799, 2],
...                [1, 41.87624, 1], [3, 48.0782, 1], [1, 43.23673, 2],
...                [3, 39.41294, 1], [2, 23.93346, 1], [3, 42.8413, 2],
...                [3, 30.40669, 1], [1, 37.77389, 1]])
>>> # Array 2 of predictors
>>> x2 = np.array([[1, 35.66064, 1], [1, 26.68771, 1], [3, 23.11251, 2],
...                [3, 58.14765, 1], [2, 44.95038, 1], [3, 42.45634, 1],
...                [3, 34.97379, 2], [3, 53.54269, 2], [2, 32.57257, 2],
...                [1, 46.91201, 1], [1, 30.93306, 1], [1, 51.63743, 2],
...                [1, 34.67712, 2], [3, 53.84584, 1], [3, 14.97474, 1],
...                [2, 44.4485, 2], [2, 47.10448, 1], [3, 43.96467, 1],
...                [3, 55.55741, 2], [2, 36.63123, 2], [3, 32.35164, 2],
...                [2, 55.75668, 1], [1, 36.83637, 2], [3, 46.7913, 1],
...                [3, 44.24153, 2], [2, 49.94011, 1], [2, 41.91916, 1],
...                [3, 24.78584, 2], [3, 50.79019, 2], [2, 39.97886, 2],
...                [1, 34.42149, 2], [2, 41.93271, 2], [1, 28.59433, 2],
...                [2, 38.47255, 2], [3, 32.11676, 2], [3, 37.19347, 1],
...                [1, 52.89337, 1], [1, 34.64874, 1], [2, 48.61935, 2],
...                [2, 33.99104, 1], [3, 38.32489, 2], [1, 35.53967, 2],
...                [1, 29.59645, 1], [2, 21.14665, 1], [2, 51.11257, 2],
...                [1, 34.20155, 1], [1, 44.40374, 1], [2, 49.67626, 2],
...                [3, 58.35377, 1], [1, 28.03744, 1]])
>>> # Array 1 of response counts
>>> y1 = np.array([1, 2, 3, 4, 3, 3, 4, 4, 4, 4, 2, 1, 4, 1, 1, 1, 4, 4,
...                3, 1, 2, 3, 3, 4, 2, 3, 4, 1, 2, 4, 3, 4, 4, 1, 3, 4,
...                4, 2, 3, 4, 2, 2, 4, 3, 1, 4, 3, 4, 2, 3])
>>> # Array 2 of response counts
>>> y2 = np.array([1, 4, 1, 4, 1, 1, 3, 1, 2, 4, 3, 1, 3, 2, 4, 4, 4, 2,
...                3, 2, 1, 4, 4, 4, 4, 3, 1, 1, 3, 1, 4, 2, 4, 2, 1, 2,
...                3, 1, 1, 4, 1, 2, 4, 3, 4, 2, 4, 3, 2, 4])
>>> x3 = np.empty((100, 3))
>>> y3 = np.empty((100,))
>>> n_predictors = 3
>>> n_classes = 4
>>> resp1 = reg.ResponseFormatID(y1, n_classes)
>>> # Fit first model to x1, resp1
>>> model1 = reg.LogisticRegression(n_predictors, n_classes)
>>> model1.fit(x1, resp1)
>>> np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
>>> print("First Model Coefficients:\n" +
...       str(model1.coefficients)) 
First Model Coefficients:
[[ 1.691  0.350 -0.137  1.057]
 [-1.254  0.242 -0.004  0.115]
 [ 1.032  0.278  0.016 -1.954]
 [ 0.000  0.000  0.000  0.000]]
>>> print("First Model Standard Errors:\n" +
...       str(model1.stderrs)) 
First Model Standard Errors:
[[ 2.389  0.565  0.061  1.025]
 [ 2.197  0.509  0.047  0.885]
 [ 2.007  0.461  0.043  0.958]]
>>> # Update first model with x2, y2
>>> resp2 = reg.ResponseFormatID(y2, n_classes)
>>> model1.fit(x2, resp2, update=True)
>>> print("Combined Model Coefficients:\n" +
...       str(model1.coefficients)) 
Combined Model Coefficients:
[[-1.169  0.649 -0.038  0.608]
 [-1.935  0.435  0.002  0.215]
 [-0.193  0.282  0.002 -0.630]
 [ 0.000  0.000  0.000  0.000]]
>>> print("Combined Model Standard Errors:\n" +
...       str(model1.stderrs)) 
Combined Model Standard Errors:
[[ 1.489  0.359  0.029  0.588]
 [ 1.523  0.358  0.030  0.584]
 [ 1.461  0.344  0.030  0.596]]
>>> # Combine data, using model1 instance
>>> y3[0:50] = y1[:]
>>> y3[50:100] = y2[:]
>>> x3[0:50, :] = x1[:, :]
>>> x3[50:100, :] = x2[:, :]
>>> resp3 = reg.ResponseFormatID(y3, n_classes)
>>> model1.fit(x3, resp3)
>>> print("Full Data Model Coefficients:\n" +
...       str(model1.coefficients)) 
Full Data Model Coefficients:
[[-1.009  0.640 -0.051  0.764]
 [-2.008  0.436  0.003  0.263]
 [-0.413  0.299  0.004 -0.593]
 [ 0.000  0.000  0.000  0.000]]
>>> print("Full Data Model Standard Errors:\n" +
...       str(model1.stderrs)) 
Full Data Model Standard Errors:
[[ 1.466  0.350  0.029  0.579]
 [ 1.520  0.357  0.029  0.581]
 [ 1.389  0.336  0.028  0.577]]
>>> # Put back the default options
>>> np.set_printoptions()

Methods

aggregate(*models) Combine separate fits of the logistic regression model.
fit(x, y[, update, guess, tol, max_iter]) Fit or update the logistic regression model using the given data.
predict(data[, responses, frequencies, …]) Predict responses for new predictor samples.

Attributes

class_means Return the overall means for each class variable.
coefficients Return the logistic regression coefficients.
gradient Return the Gradient of the log-likelihood.
hessian Return the Hessian of the log-likelihood.
intercept Return information about an intercept in the model.
likeli_ratio_test_stat Return the likelihood ratio test statistic.
log_likeli Return the log-likelihood at the estimated coefficients.
n_classes Return the number of classes (categories) in the model.
n_coeffs Return the number of coefficients in the model.
n_obs Return the total number of observations.
n_predictors Return the number of predictors.
n_updates Return the number of updates.
n_x_interact Return the number of interaction terms in the model.
ref_class Return the number of the reference class.
stderrs Return the standard errors for the estimated coefficients.