logisticRegPredict¶
Predict a binomial or multinomial outcome given an estimated model and new values of the independent variables.
Synopsis¶
logisticRegPredict (nIndependent, nClasses, coefs, x)
Required Arguments¶
- int
nIndependent
(Input) - The number of independent variables.
- int
nClasses
(Input) - The number of discrete outcomes, or classes.
- float
coefs[[]]
(Input) - Array of length
nCoefficients
×nClasses
containing the coefficient estimates of the logistic regression model.nCoefficients
is the number of coefficients in the model. - float
x[[]]
(Input) - Array of length
nObservations
×nIndependent
containing the values of the independent variables.
Return Value¶
An array containing the predicted responses. The predicted value is the
predicted number of outcomes in each class for each new observation provided
in x
. If frequencies[i]
= 1 for all observations, then the return
value is equivalent to the predicted probabilities. If the option
confidence
is specified, the length of the return array is
(nObservations
× nClasses
× 3) and the array includes the lower and
upper prediction limits. Otherwise, the array is of length
(nObservations
× nClasses)
. Note that if the data is column-oriented
(see columnWise)
, the return value will also be column‑oriented.
Optional Arguments¶
y
, float[]
(Input)Array containing the actual responses corresponding to the independent variables. If present, the expected length for
y
isnObservations
×nClasses
unless one ofgroups
orgroupCounts
is also present.y
is required whenprederr
is requested.Default: The function expects that
y
is not given.
groupCounts
(Input)
or
groups
(Input)These optional arguments specify alternative formats of the input array
y
. IfgroupCounts
is present,y
is of lengthnObservations
×(nClasses
‑ 1), and contains counts for all but one of the classes for each observation. The missing class is treated as the reference class. IfgroupCounts
is present and if anyy[i]
> 1,frequencies
is required. Ifgroups
is present, the input arrayy
is of lengthnObservations
andy[i]
contains the group number to which the observation belongs. In this case,frequencies[i]
is set to 1 for all observations.Default: Unless one of the arguments is present, the function expects that
y
isnObservations
×nClasses
and contains counts for all the classes.columnWise
(Input)If present, the input arrays are column‑oriented. That is, contiguous elements in
x
are values of the same independent variable, or column, except at multiples ofnObservations
.Default: Input arrays are row‑oriented.
frequencies
, int[]
(Input)Array of length
nObservations
containing the number of replications or trials for each of the observations. This argument is required ifgroupCounts
is present and if anyy[i]
> 1.Default:
frequencies[i]
= 1.referenceClass
, int (Input)Number specifying which class or outcome category to use as the reference class. The purpose of the reference class is explained in the Description section.
Default:
referenceClass
=nClasses
.noIntercept
(Input)If present, the model will not include an intercept term.
Default: The intercept term is included.
xIndices
,xin
(Input)An array of length
nXin
providing the variable indices ofx
that correspond to the independent variables the user wishes to be included in the logistic regression model.Default: All
nIndependent
variables are included.xInteractions
,xinteract
(Input)An array of length
nXinteract
× 2 providing pairs of variable indices ofx
that define the interaction terms in the model. Adjacent indices should be unique.Default: No interaction terms are included.
confidence
, float (Input)This value provides the confidence level to use in the calculation of the prediction intervals. If this argument is present and valid (0
<
confidence
<
100),confidence%
prediction intervals are provided for each predicted value.Default: Prediction intervals are not provided.
model
, structure (Input)A structure containing information about the logistic regression fit. See
logisticRegression
. Required whenconfidence
is present.Default: Not needed if
confidence
is not present.prederr
(Output)- The mean squared prediction error when
y
is present.
Description¶
Function logisticRegPredict
calculates the predicted outcomes for a
binomial or multinomial response variable given an estimated logistic
regression model and new observations of the independent variables.
For a binary response y, the objective is to estimate the conditional probability of success, \(\pi_1(x)=\Pr\left[ y=1 | x \right]\), where \(x=\left( x_1,x_2,\ldots,x_p \right)'\) is a realization of p independent variables. In particular, the estimated probability of success
where
and
\(\hat{\beta}_{01},\hat{\beta}_1=\left( \hat{\beta}_{11},\hat{\beta}_{12},\ldots\hat{\beta}_{1p} \right)'\) are the coefficient estimates. Then \(\hat{y}=n_i \pi_1 \left( x_i \right)\). That is, \(\hat{y}\) is the expected value of the response under the estimated model given the values of the independent variables.
Similarly, for a multinomial response, with class K the reference class,
Then
and \(\hat{y}_k=n_i \pi_k \left( x_i \right)\). If the actual responses are given, the mean squared prediction error is
If requested, \(100(1-\alpha)\%\) prediction intervals are provided for the predicted values by first finding the prediction standard errors of the logits, \(\hat{\eta}_{ik}=\hat{\beta}_{0k}+x_i^T \hat{\beta}_k\), and then evaluating
to obtain the upper and lower limits for \(\hat{\pi}_k \left( x_i \right)\), where \(z_{\alpha/2}\) is the upper \(\alpha/2\) quantile of the standard normal distribution. Note that properties of the prediction intervals are only valid when the new observations are inside the range of the original data used to fit the model. Generally, the model should not be used to extrapolate outside the range of the original data. See Hosmer and Lemeshow (2000) for further details.
Examples¶
Example 1¶
The model fit to the beetle mortality data of Prentice (1976) is used to
predict the expected mortality at three new doses. For the original data,
see Example 1 in logisticRegression
.
Log Dosage | Number of Beetles Exposed | Number of Deaths |
---|---|---|
1.66 | 16 | ?? |
1.87 | 22 | ?? |
1.71 | 11 | ?? |
from numpy import *
from pyimsl.stat.logisticRegPredict import logisticRegPredict
from pyimsl.stat.logisticRegression import logisticRegression
from pyimsl.stat.writeMatrix import writeMatrix
y1 = [6, 13, 18, 28, 52, 53, 61, 60]
x1 = [1.69, 1.724, 1.755, 1.784, 1.811, 1.836, 1.861, 1.883]
x2 = [1.66, 1.87, 1.71]
freqs1 = [59, 60, 62, 56, 63, 59, 62, 60]
freqs2 = [16, 22, 11]
n_classes = 2
n_observations = 8
n_independent = 1
n_coefs = 2
n_new_observations = 3
coefs = logisticRegression(n_independent,
n_classes, x1, y1,
groupCounts=True,
frequencies=freqs1)
rowLabels = ['1', '2']
writeMatrix("Coefficient Estimates", coefs.reshape(4)[0:2],
transpose=True,
noColLabels=True,
rowLabels=rowLabels)
yhat = logisticRegPredict(n_independent, n_classes, coefs, x2,
frequencies=freqs2)
tmp = []
colLabels = ' Dose N Expected Deaths'
yhatlst = [yhat[0, 0], yhat[1, 0], yhat[2, 0]]
for i in range(0, n_new_observations):
tmp.extend([[x2[i], freqs2[i], yhatlst[i]]])
writeMatrix(colLabels, tmp, noRowLabels=True, noColLabels=True)
Output¶
Coefficient Estimates
1 -60.76
2 34.30
Dose N Expected Deaths
1.66 16.00 0.34
1.87 22.00 21.28
1.71 11.00 1.19
Example 2¶
A logistic regression model is fit to artificial (noisy) data with 4 classes and 3 independent variables and used to predict class probabilities at 10 new values of the independent variables. Also shown are the mean squared prediction error and upper and lower limits of the 95% prediction interval for each predicted value.
from __future__ import print_function
from numpy import *
from pyimsl.stat.logisticRegPredict import logisticRegPredict
from pyimsl.stat.logisticRegression import logisticRegression
from pyimsl.stat.chiSquaredCdf import chiSquaredCdf
from pyimsl.stat.writeMatrix import writeMatrix
x = [[3, 2, 2, 1, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 2, 3, 2, 1, 3, 2,
2, 1, 2, 1, 3, 2, 1, 2, 1, 2, 3, 2, 1, 2, 1, 1, 2, 3, 1, 2,
1, 1, 1, 3, 1, 3, 2, 3, 3, 1],
[25.92869, 51.63245, 25.78432, 39.37948, 24.65058, 45.20084,
52.6796, 44.28342, 40.63523, 51.76094, 26.30368, 20.70230,
38.74273, 19.47333, 26.42211, 37.05986, 51.67043, 42.40156,
33.90027, 35.43282, 44.30369, 46.72387, 46.99262, 36.05923,
36.83197, 61.66257, 25.67714, 39.08567, 48.84341, 39.34391,
24.73522, 50.55251, 31.34263, 27.15795, 31.72685, 25.00408,
26.35457, 38.12343, 49.9403, 42.45779, 38.80948, 43.22799,
41.87624, 48.0782, 43.23673, 39.41294, 23.93346,
42.8413, 30.40669, 37.77389],
[1, 2, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1,
1, 1, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2,
1, 1, 2, 1, 1, 2, 1, 1]]
y = [1, 2, 3, 4, 3, 3, 4, 4, 4, 4, 2, 1, 4, 1, 1, 1, 4, 4, 3, 1, 2,
3, 3, 4, 2, 3, 4, 1, 2, 4, 3, 4, 4, 1, 3, 4, 4, 2, 3, 4, 2, 2,
4, 3, 1, 4, 3, 4, 2, 3]
newx = [[2, 2, 1, 3, 3, 3, 2, 3, 3, 3],
[25.92869, 51.63245, 25.78432, 39.37948, 24.65058, 45.20084,
52.6796, 44.28342, 40.63523, 51.76094],
[1, 2, 1, 1, 1, 1, 2, 2, 2, 1]]
newy = [3, 2, 1, 1, 4, 3, 2, 2, 1, 2]
model_info = []
lrstat = []
mspe = []
n_classes = 4
n_observations = 50
n_new_obs = 10
n_independent = 3
n_coefs = 4
coefs = logisticRegression(n_independent,
n_classes, x, y,
groups=True,
columnWise=True,
lrstat=lrstat,
nextResults=model_info)
yhat = logisticRegPredict(n_independent,
n_classes, coefs, newx,
y=newy,
groups=True,
columnWise=True,
confidence=95.0,
model=model_info[0],
prederr=mspe)
dof = n_coefs * (n_classes - 1) - (n_classes - 1)
model_pval = 1.0 - chiSquaredCdf(lrstat[0], dof)
print("Model Fit Summary:")
print("Log-likelihood: %5.2f " % model_info[0].loglike)
print("LR test statistic: %5.2f " % lrstat[0])
print("Degrees of freedom: ", dof)
print("P-value: %5.4f " % model_pval)
print("")
print("Prediction Summary:")
print("Mean squared prediction error: %4.2f" % mspe[0])
print("")
print("Obs\tClass Estimate Lower Upper")
for i in range(0, n_new_obs):
for j in range(0, n_classes):
print("%d\t %d\t %5.2f %5.2f %5.2f" %
(i + 1, j + 1, yhat[j, 0, i], yhat[j, 1, i], yhat[j, 2, i]))
Output¶
Model Fit Summary:
Log-likelihood: -58.58
LR test statistic: 16.37
Degrees of freedom: 9
P-value: 0.0595
Prediction Summary:
Mean squared prediction error: 0.21
Obs Class Estimate Lower Upper
1 1 0.26 0.14 0.35
1 2 0.14 0.06 0.20
1 3 0.31 0.18 0.36
1 4 0.29 0.10 0.62
2 1 0.04 0.01 0.14
2 2 0.27 0.11 0.39
2 3 0.12 0.04 0.25
2 4 0.57 0.22 0.85
3 1 0.23 0.07 0.38
3 2 0.13 0.04 0.20
3 3 0.28 0.12 0.34
3 4 0.36 0.08 0.77
4 1 0.06 0.02 0.14
4 2 0.16 0.07 0.24
4 3 0.49 0.28 0.54
4 4 0.29 0.08 0.63
5 1 0.34 0.17 0.41
5 2 0.13 0.06 0.19
5 3 0.30 0.17 0.34
5 4 0.22 0.05 0.60
6 1 0.03 0.00 0.09
6 2 0.16 0.06 0.24
6 3 0.53 0.27 0.60
6 4 0.29 0.07 0.67
7 1 0.04 0.01 0.13
7 2 0.27 0.10 0.40
7 3 0.13 0.04 0.26
7 4 0.57 0.21 0.86
8 1 0.14 0.04 0.26
8 2 0.29 0.12 0.37
8 3 0.12 0.04 0.21
8 4 0.46 0.15 0.80
9 1 0.21 0.08 0.33
9 2 0.27 0.12 0.35
9 3 0.10 0.03 0.19
9 4 0.42 0.14 0.77
10 1 0.01 0.00 0.05
10 2 0.15 0.04 0.24
10 3 0.57 0.23 0.67
10 4 0.28 0.05 0.73
Warning Errors¶
IMSLS_NO_ACTUALS |
The average squared prediction error cannot be calculated because no actual “y” values are given. |
Fatal Errors¶
IMSLS_OVERFLOW |
The linear predictor = # is too large and will lead to overflow when exponentiated. |