regressionPrediction¶
Computes predicted values, confidence intervals, and diagnostics after fitting a regression model.
Synopsis¶
regressionPrediction (regressionInfo, x)
Required Argument¶
- structure
regressionInfo
(Input) - A structure containing information about the regression fit. See regression.
- float
x[]
(Input) - Array of size
nPredict
by the number of independent variables containing the combinations of independent variables in each row for which calculations are to be performed.
Return Value¶
An array of length nPredict
containing the predicted values.
Optional Arguments¶
indexRegression
, int (Input)Given a multivariate regression fit, this option allows the user to specify for which regression statistics will be computed.
Default:
indexRegression
= 0xIndices
, intindind[]
, intinddep
, intifrq
, intiwt
(Input)This argument allows an alternative method for data specification. Data (independent, dependent, frequencies, and weights) is all stored in the data matrix
x
. Argumenty
, and keywordweights
are ignored.Each of the four arguments contains indices indicating column numbers of
x
in which particular types of data are stored.Parameter
indind
contains the indices of the independent variables.Parameter
inddep
contains the indices of the dependent variables. If there is to be no dependent variable, this must be indicated by setting the first element of the vector to −1.Parameters
ifrq
andiwt
contain the column numbers ofx
in which the frequencies and weights, respectively, are stored. Setifrq
= −1 if there will be no column for frequencies. Setiwt
= −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.Note that frequencies are not referenced by function
regressionPrediction
, and is included here only for the sake of keyword consistency.Finally, note that
xIndices
andy
are mutually exclusive keywords, and may not be specified in the same call toregressionPrediction
.weights
, float[]
(Input)Array of length
nPredict
containing the weight for each row ofx
. The computed prediction interval uses SSE/(DFE*weights
[i]) for the estimated variance of a future response, where SSE is sum of squares error and DFE is degrees of freedom error.Default:
weights[]
= 1confidence
, float (Input)Confidence level for both two-sided interval estimates on the mean and for two-sided prediction intervals, in percent. Argument
confidence
must be in the range [0.0, 100.0). For one-sided intervals with confidence levelonecl
, where 50.0 ≤onecl
< 100.0, setconfidence
= 100.0 − 2.0 * (100.0 −onecl
).Default:
confidence
= 95.0scheffeCi
,lowerLimit
,upperLimit
(Output)- Array
lowerLimit
is an array of lengthnPredict
containing the lower confidence limits of Scheffé confidence intervals corresponding to the rows ofx
. ArrayupperLimit
is an array of lengthnPredict
containing the upper confidence limits of Scheffé confidence intervals corresponding to the rows ofx
. pointwiseCiPopMean
,lowerLimit
,upperLimit
(Output)- Array
lowerLimit
is an array of lengthnPredict
containing the lower-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x. ArrayupperLimit
is an array of lengthnPredict
containing the upper-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows ofx
. pointwiseCiNewSample
,lowerLimit
,upperLimit
(Output)- Array
lowerLimit
is an array of lengthnPredict
containing the lower-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows ofx
. ArrayupperLimit
is an array of lengthnPredict
containing the upper-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows ofx
. leverage
(Output)- An array of length
nPredict
containing the leverages. y
, float[]
(Input)- Array of length
nPredict
containing the observed responses.
y (or xIndices ) must be specified if any of the following
optional arguments are specified. |
residual
(Output)- An array of length
nPredict
containing the residuals. standardizedResidual
(Output)- An array of length
nPredict
containing the standardized residuals. deletedResidual
(Output)- An array of length
nPredict
containing the deleted residuals. cooksd
(Output)- An array of length
nPredict
containing the Cook’s D statistics. dffits
(Output)- An array of length
nPredict
containing the DFFITS statistics.
Description¶
The general linear model used by function regressionPrediction
is
where y is the n × 1 vector of responses, X is the n × p matrix of regressors, β is the p × 1 vector of regression coefficients, and ɛ is the n × 1 vector of errors whose elements are independently normally distributed with mean 0 and the variance below.
From a general linear model fit using the \(w_i\)’s as the weights,
function regressionPrediction
computes confidence intervals and
statistics for the individual cases that constitute the data set. Let
\(x_i\) be a column vector containing elements of the i‑th row of
X. The leverage is defined by
where \(W=\mathrm{diag}(w_1,w_2,\ldots,w_n)\) and \((X^T WX)^-\) denotes a generalized inverse of \(X^T WX\).
Put \(D=\mathrm{diag}(d_1,d_2,\ldots,d_n)\) with \(d_j=1\) if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as \(h_i=(a^T Da) w_i\) where a is a solution to \(R^T a=x_i\). The estimated variance of
is given by the following:
where
The computation of the remainder of the case statistics follow easily from their definitions. See Diagnostics for Individual Cases for the definition of the case diagnostics.
Informational errors can occur if the input matrix x
is not consistent
with the information from the fit (contained in regressionInfo
), or if
excess rounding has occurred. The warning error IMSLS_NONESTIMABLE
arises when x
contains a row not in the space spanned by the rows of
R. An examination of the model that was fitted and the x
for which
diagnostics are to be computed is required in order to ensure that only
linear combinations of the regression coefficients that can be estimated
from the fitted model are specified in x
. For further details, see the
discussion of estimable functions given in Maindonald (1984, pp. 166−168)
and Searle (1971, pp. 180−188).
Often predicted values and confidence intervals are desired for combinations
of settings of the independent variables not used in computing the
regression fit. This can be accomplished by defining a new data matrix.
Since the information about the model fit is input in regressionInfo
, it
is not necessary to send in the data set used for the original calculation
of the fit, i.e., only variable combinations for which predictions are
desired need be entered in x
.
Examples¶
Example 1¶
from numpy import *
from pyimsl.stat.regression import regression
from pyimsl.stat.regressionPrediction import regressionPrediction
from pyimsl.stat.writeMatrix import writeMatrix
x = array([
[7.0, 26.0, 6.0, 60.0],
[1.0, 29.0, 15.0, 52.0],
[11.0, 56.0, 8.0, 20.0],
[11.0, 31.0, 8.0, 47.0],
[7.0, 52.0, 6.0, 33.0],
[11.0, 55.0, 9.0, 22.0],
[3.0, 71.0, 17.0, 6.0],
[1.0, 31.0, 22.0, 44.0],
[2.0, 54.0, 18.0, 22.0],
[21.0, 47.0, 4.0, 26.0],
[1.0, 40.0, 23.0, 34.0],
[11.0, 66.0, 9.0, 12.0],
[10.0, 68.0, 8.0, 12.0]])
y = array([78.5, 74.3, 104.3, 87.6, 95.9, 109.2,
102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4])
regression_info = []
# Fit the regression model
coefficients = regression(x, y, regressionInfo=regression_info)
regression_info = regression_info[0]
# Generate case statistics
y_hat = regressionPrediction(regression_info, x)
# Print results
writeMatrix("Predicted Responses", y_hat)
Output¶
Predicted Responses
1 2 3 4 5 6
78.5 72.8 106.0 89.3 95.6 105.3
7 8 9 10 11 12
104.1 75.7 91.7 115.6 81.8 112.3
13
111.7
Example 2¶
from numpy import *
from pyimsl.stat.regression import regression
from pyimsl.stat.regressionPrediction import regressionPrediction
from pyimsl.stat.writeMatrix import writeMatrix
x = array([
[7.0, 26.0, 6.0, 60.0],
[1.0, 29.0, 15.0, 52.0],
[11.0, 56.0, 8.0, 20.0],
[11.0, 31.0, 8.0, 47.0],
[7.0, 52.0, 6.0, 33.0],
[11.0, 55.0, 9.0, 22.0],
[3.0, 71.0, 17.0, 6.0],
[1.0, 31.0, 22.0, 44.0],
[2.0, 54.0, 18.0, 22.0],
[21.0, 47.0, 4.0, 26.0],
[1.0, 40.0, 23.0, 34.0],
[11.0, 66.0, 9.0, 12.0],
[10.0, 68.0, 8.0, 12.0]])
y = array([78.5, 74.3, 104.3, 87.6, 95.9, 109.2,
102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4])
regression_info = []
# Fit the regression model
coefficients = regression(x, y, regressionInfo=regression_info)
regression_info = regression_info[0]
# Generate the case statistics
leverage = []
residual = []
standardized_residual = []
deleted_residual = []
cooksd = []
dffits = []
pointwise_ci_pop_mean = {}
pointwise_ci_new_sample = {}
scheffe_ci = {}
y_hat = regressionPrediction(regression_info, x,
y=y, leverage=leverage,
residual=residual,
standardizedResidual=standardized_residual,
deletedResidual=deleted_residual,
cooksd=cooksd, dffits=dffits,
pointwiseCiPopMean=pointwise_ci_pop_mean,
pointwiseCiNewSample=pointwise_ci_new_sample,
scheffeCi=scheffe_ci)
# Print results
writeMatrix("Predicted Responses", y_hat)
writeMatrix("Residuals", residual)
writeMatrix("Standardized Residuals", standardized_residual)
writeMatrix("Leverages", leverage)
writeMatrix("Deleted Residuals", deleted_residual)
writeMatrix("Cooks D", cooksd)
writeMatrix("DFFITS", dffits)
writeMatrix("Scheffe Lower Limit",
scheffe_ci["lowerLimit"])
writeMatrix("Scheffe Upper Limit",
scheffe_ci["upperLimit"])
writeMatrix("Population Mean Lower Limit",
pointwise_ci_pop_mean["lowerLimit"])
writeMatrix("Population Mean Upper Limit",
pointwise_ci_pop_mean["upperLimit"])
writeMatrix("New Sample Lower Limit",
pointwise_ci_new_sample["lowerLimit"])
writeMatrix("New Sample Upper Limit",
pointwise_ci_new_sample["upperLimit"])
Output¶
Predicted Responses
1 2 3 4 5 6
78.5 72.8 106.0 89.3 95.6 105.3
7 8 9 10 11 12
104.1 75.7 91.7 115.6 81.8 112.3
13
111.7
Residuals
1 2 3 4 5 6
0.005 1.511 -1.671 -1.727 0.251 3.925
7 8 9 10 11 12
-1.449 -3.175 1.378 0.282 1.991 0.973
13
-2.294
Standardized Residuals
1 2 3 4 5 6
0.003 0.757 -1.050 -0.841 0.128 1.715
7 8 9 10 11 12
-0.744 -1.688 0.671 0.210 1.074 0.463
13
-1.124
Leverages
1 2 3 4 5 6
0.5503 0.3332 0.5769 0.2952 0.3576 0.1242
7 8 9 10 11 12
0.3671 0.4085 0.2943 0.7004 0.4255 0.2630
13
0.3037
Deleted Residuals
1 2 3 4 5 6
0.003 0.735 -1.058 -0.824 0.120 2.017
7 8 9 10 11 12
-0.722 -1.967 0.646 0.197 1.086 0.439
13
-1.146
Cooks D
1 2 3 4 5 6
0.0000 0.0572 0.3009 0.0593 0.0018 0.0834
7 8 9 10 11 12
0.0643 0.3935 0.0375 0.0207 0.1708 0.0153
13
0.1102
DFFITS
1 2 3 4 5 6
0.003 0.519 -1.236 -0.533 0.089 0.759
7 8 9 10 11 12
-0.550 -1.635 0.417 0.302 0.935 0.262
13
-0.757
Scheffe Lower Limit
1 2 3 4 5 6
70.7 66.7 98.0 83.6 89.4 101.6
7 8 9 10 11 12
97.8 69.0 86.0 106.8 75.0 106.9
13
105.9
Scheffe Upper Limit
1 2 3 4 5 6
86.3 78.9 113.9 95.0 101.9 109.0
7 8 9 10 11 12
110.5 82.4 97.4 124.4 88.7 117.7
13
117.5
Population Mean Lower Limit
1 2 3 4 5 6
74.3 69.5 101.7 86.3 92.3 103.3
7 8 9 10 11 12
100.7 72.1 88.7 110.9 78.1 109.4
13
108.6
Population Mean Upper Limit
1 2 3 4 5 6
82.7 76.0 110.3 92.4 99.0 107.3
7 8 9 10 11 12
107.6 79.3 94.8 120.3 85.5 115.2
13
114.8
New Sample Lower Limit
1 2 3 4 5 6
71.5 66.3 98.9 82.9 89.1 99.3
7 8 9 10 11 12
97.6 69.0 85.3 108.3 75.1 106.0
13
105.3
New Sample Upper Limit
1 2 3 4 5 6
85.5 79.3 113.1 95.7 102.2 111.3
7 8 9 10 11 12
110.7 82.4 98.1 123.0 88.5 118.7
13
118.1
Warning Errors¶
IMSLS_NONESTIMABLE |
Within the preset tolerance, the linear combination of regression coefficients is nonestimable. |
IMSLS_LEVERAGE_GT_1 |
A leverage (= #) much greater than 1.0 is computed. It is set to 1.0. |
IMSLS_DEL_MSE_LT_0 |
A deleted residual mean square (= #) much less than 0 is computed. It is set to 0. |
Fatal Errors¶
IMSLS_NONNEG_WEIGHT_REQUEST_2 |
The weight for row # was #. Weights must be nonnegative. |