regressionPrediction¶
Computes predicted values, confidence intervals, and diagnostics after fitting a regression model.
Synopsis¶
regressionPrediction (regressionInfo, x)
Required Argument¶
- structure
regressionInfo(Input) - A structure containing information about the regression fit. See regression.
- float
x[](Input) - Array of size
nPredictby the number of independent variables containing the combinations of independent variables in each row for which calculations are to be performed.
Return Value¶
An array of length nPredict containing the predicted values.
Optional Arguments¶
indexRegression, int (Input)Given a multivariate regression fit, this option allows the user to specify for which regression statistics will be computed.
Default:
indexRegression= 0xIndices, intindind[], intinddep, intifrq, intiwt(Input)This argument allows an alternative method for data specification. Data (independent, dependent, frequencies, and weights) is all stored in the data matrix
x. Argumenty, and keywordweightsare ignored.Each of the four arguments contains indices indicating column numbers of
xin which particular types of data are stored.Parameter
indindcontains the indices of the independent variables.Parameter
inddepcontains the indices of the dependent variables. If there is to be no dependent variable, this must be indicated by setting the first element of the vector to −1.Parameters
ifrqandiwtcontain the column numbers ofxin which the frequencies and weights, respectively, are stored. Setifrq= −1 if there will be no column for frequencies. Setiwt= −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.Note that frequencies are not referenced by function
regressionPrediction, and is included here only for the sake of keyword consistency.Finally, note that
xIndicesandyare mutually exclusive keywords, and may not be specified in the same call toregressionPrediction.weights, float[](Input)Array of length
nPredictcontaining the weight for each row ofx. The computed prediction interval uses SSE/(DFE*weights[i]) for the estimated variance of a future response, where SSE is sum of squares error and DFE is degrees of freedom error.Default:
weights[]= 1confidence, float (Input)Confidence level for both two-sided interval estimates on the mean and for two-sided prediction intervals, in percent. Argument
confidencemust be in the range [0.0, 100.0). For one-sided intervals with confidence levelonecl, where 50.0 ≤onecl< 100.0, setconfidence= 100.0 − 2.0 * (100.0 −onecl).Default:
confidence= 95.0scheffeCi,lowerLimit,upperLimit(Output)- Array
lowerLimitis an array of lengthnPredictcontaining the lower confidence limits of Scheffé confidence intervals corresponding to the rows ofx. ArrayupperLimitis an array of lengthnPredictcontaining the upper confidence limits of Scheffé confidence intervals corresponding to the rows ofx. pointwiseCiPopMean,lowerLimit,upperLimit(Output)- Array
lowerLimitis an array of lengthnPredictcontaining the lower-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x. ArrayupperLimitis an array of lengthnPredictcontaining the upper-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows ofx. pointwiseCiNewSample,lowerLimit,upperLimit(Output)- Array
lowerLimitis an array of lengthnPredictcontaining the lower-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows ofx. ArrayupperLimitis an array of lengthnPredictcontaining the upper-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows ofx. leverage(Output)- An array of length
nPredictcontaining the leverages. y, float[](Input)- Array of length
nPredictcontaining the observed responses.
y (or xIndices) must be specified if any of the following
optional arguments are specified. |
residual(Output)- An array of length
nPredictcontaining the residuals. standardizedResidual(Output)- An array of length
nPredictcontaining the standardized residuals. deletedResidual(Output)- An array of length
nPredictcontaining the deleted residuals. cooksd(Output)- An array of length
nPredictcontaining the Cook’s D statistics. dffits(Output)- An array of length
nPredictcontaining the DFFITS statistics.
Description¶
The general linear model used by function regressionPrediction is
where y is the n × 1 vector of responses, X is the n × p matrix of regressors, β is the p × 1 vector of regression coefficients, and ɛ is the n × 1 vector of errors whose elements are independently normally distributed with mean 0 and the variance below.
From a general linear model fit using the \(w_i\)’s as the weights,
function regressionPrediction computes confidence intervals and
statistics for the individual cases that constitute the data set. Let
\(x_i\) be a column vector containing elements of the i‑th row of
X. The leverage is defined by
where \(W=\mathrm{diag}(w_1,w_2,\ldots,w_n)\) and \((X^T WX)^-\) denotes a generalized inverse of \(X^T WX\).
Put \(D=\mathrm{diag}(d_1,d_2,\ldots,d_n)\) with \(d_j=1\) if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as \(h_i=(a^T Da) w_i\) where a is a solution to \(R^T a=x_i\). The estimated variance of
is given by the following:
where
The computation of the remainder of the case statistics follow easily from their definitions. See Diagnostics for Individual Cases for the definition of the case diagnostics.
Informational errors can occur if the input matrix x is not consistent
with the information from the fit (contained in regressionInfo), or if
excess rounding has occurred. The warning error IMSLS_NONESTIMABLE
arises when x contains a row not in the space spanned by the rows of
R. An examination of the model that was fitted and the x for which
diagnostics are to be computed is required in order to ensure that only
linear combinations of the regression coefficients that can be estimated
from the fitted model are specified in x. For further details, see the
discussion of estimable functions given in Maindonald (1984, pp. 166−168)
and Searle (1971, pp. 180−188).
Often predicted values and confidence intervals are desired for combinations
of settings of the independent variables not used in computing the
regression fit. This can be accomplished by defining a new data matrix.
Since the information about the model fit is input in regressionInfo, it
is not necessary to send in the data set used for the original calculation
of the fit, i.e., only variable combinations for which predictions are
desired need be entered in x.
Examples¶
Example 1¶
from numpy import *
from pyimsl.stat.regression import regression
from pyimsl.stat.regressionPrediction import regressionPrediction
from pyimsl.stat.writeMatrix import writeMatrix
x = array([
[7.0, 26.0, 6.0, 60.0],
[1.0, 29.0, 15.0, 52.0],
[11.0, 56.0, 8.0, 20.0],
[11.0, 31.0, 8.0, 47.0],
[7.0, 52.0, 6.0, 33.0],
[11.0, 55.0, 9.0, 22.0],
[3.0, 71.0, 17.0, 6.0],
[1.0, 31.0, 22.0, 44.0],
[2.0, 54.0, 18.0, 22.0],
[21.0, 47.0, 4.0, 26.0],
[1.0, 40.0, 23.0, 34.0],
[11.0, 66.0, 9.0, 12.0],
[10.0, 68.0, 8.0, 12.0]])
y = array([78.5, 74.3, 104.3, 87.6, 95.9, 109.2,
102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4])
regression_info = []
# Fit the regression model
coefficients = regression(x, y, regressionInfo=regression_info)
regression_info = regression_info[0]
# Generate case statistics
y_hat = regressionPrediction(regression_info, x)
# Print results
writeMatrix("Predicted Responses", y_hat)
Output¶
Predicted Responses
1 2 3 4 5 6
78.5 72.8 106.0 89.3 95.6 105.3
7 8 9 10 11 12
104.1 75.7 91.7 115.6 81.8 112.3
13
111.7
Example 2¶
from numpy import *
from pyimsl.stat.regression import regression
from pyimsl.stat.regressionPrediction import regressionPrediction
from pyimsl.stat.writeMatrix import writeMatrix
x = array([
[7.0, 26.0, 6.0, 60.0],
[1.0, 29.0, 15.0, 52.0],
[11.0, 56.0, 8.0, 20.0],
[11.0, 31.0, 8.0, 47.0],
[7.0, 52.0, 6.0, 33.0],
[11.0, 55.0, 9.0, 22.0],
[3.0, 71.0, 17.0, 6.0],
[1.0, 31.0, 22.0, 44.0],
[2.0, 54.0, 18.0, 22.0],
[21.0, 47.0, 4.0, 26.0],
[1.0, 40.0, 23.0, 34.0],
[11.0, 66.0, 9.0, 12.0],
[10.0, 68.0, 8.0, 12.0]])
y = array([78.5, 74.3, 104.3, 87.6, 95.9, 109.2,
102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4])
regression_info = []
# Fit the regression model
coefficients = regression(x, y, regressionInfo=regression_info)
regression_info = regression_info[0]
# Generate the case statistics
leverage = []
residual = []
standardized_residual = []
deleted_residual = []
cooksd = []
dffits = []
pointwise_ci_pop_mean = {}
pointwise_ci_new_sample = {}
scheffe_ci = {}
y_hat = regressionPrediction(regression_info, x,
y=y, leverage=leverage,
residual=residual,
standardizedResidual=standardized_residual,
deletedResidual=deleted_residual,
cooksd=cooksd, dffits=dffits,
pointwiseCiPopMean=pointwise_ci_pop_mean,
pointwiseCiNewSample=pointwise_ci_new_sample,
scheffeCi=scheffe_ci)
# Print results
writeMatrix("Predicted Responses", y_hat)
writeMatrix("Residuals", residual)
writeMatrix("Standardized Residuals", standardized_residual)
writeMatrix("Leverages", leverage)
writeMatrix("Deleted Residuals", deleted_residual)
writeMatrix("Cooks D", cooksd)
writeMatrix("DFFITS", dffits)
writeMatrix("Scheffe Lower Limit",
scheffe_ci["lowerLimit"])
writeMatrix("Scheffe Upper Limit",
scheffe_ci["upperLimit"])
writeMatrix("Population Mean Lower Limit",
pointwise_ci_pop_mean["lowerLimit"])
writeMatrix("Population Mean Upper Limit",
pointwise_ci_pop_mean["upperLimit"])
writeMatrix("New Sample Lower Limit",
pointwise_ci_new_sample["lowerLimit"])
writeMatrix("New Sample Upper Limit",
pointwise_ci_new_sample["upperLimit"])
Output¶
Predicted Responses
1 2 3 4 5 6
78.5 72.8 106.0 89.3 95.6 105.3
7 8 9 10 11 12
104.1 75.7 91.7 115.6 81.8 112.3
13
111.7
Residuals
1 2 3 4 5 6
0.005 1.511 -1.671 -1.727 0.251 3.925
7 8 9 10 11 12
-1.449 -3.175 1.378 0.282 1.991 0.973
13
-2.294
Standardized Residuals
1 2 3 4 5 6
0.003 0.757 -1.050 -0.841 0.128 1.715
7 8 9 10 11 12
-0.744 -1.688 0.671 0.210 1.074 0.463
13
-1.124
Leverages
1 2 3 4 5 6
0.5503 0.3332 0.5769 0.2952 0.3576 0.1242
7 8 9 10 11 12
0.3671 0.4085 0.2943 0.7004 0.4255 0.2630
13
0.3037
Deleted Residuals
1 2 3 4 5 6
0.003 0.735 -1.058 -0.824 0.120 2.017
7 8 9 10 11 12
-0.722 -1.967 0.646 0.197 1.086 0.439
13
-1.146
Cooks D
1 2 3 4 5 6
0.0000 0.0572 0.3009 0.0593 0.0018 0.0834
7 8 9 10 11 12
0.0643 0.3935 0.0375 0.0207 0.1708 0.0153
13
0.1102
DFFITS
1 2 3 4 5 6
0.003 0.519 -1.236 -0.533 0.089 0.759
7 8 9 10 11 12
-0.550 -1.635 0.417 0.302 0.935 0.262
13
-0.757
Scheffe Lower Limit
1 2 3 4 5 6
70.7 66.7 98.0 83.6 89.4 101.6
7 8 9 10 11 12
97.8 69.0 86.0 106.8 75.0 106.9
13
105.9
Scheffe Upper Limit
1 2 3 4 5 6
86.3 78.9 113.9 95.0 101.9 109.0
7 8 9 10 11 12
110.5 82.4 97.4 124.4 88.7 117.7
13
117.5
Population Mean Lower Limit
1 2 3 4 5 6
74.3 69.5 101.7 86.3 92.3 103.3
7 8 9 10 11 12
100.7 72.1 88.7 110.9 78.1 109.4
13
108.6
Population Mean Upper Limit
1 2 3 4 5 6
82.7 76.0 110.3 92.4 99.0 107.3
7 8 9 10 11 12
107.6 79.3 94.8 120.3 85.5 115.2
13
114.8
New Sample Lower Limit
1 2 3 4 5 6
71.5 66.3 98.9 82.9 89.1 99.3
7 8 9 10 11 12
97.6 69.0 85.3 108.3 75.1 106.0
13
105.3
New Sample Upper Limit
1 2 3 4 5 6
85.5 79.3 113.1 95.7 102.2 111.3
7 8 9 10 11 12
110.7 82.4 98.1 123.0 88.5 118.7
13
118.1
Warning Errors¶
IMSLS_NONESTIMABLE |
Within the preset tolerance, the linear combination of regression coefficients is nonestimable. |
IMSLS_LEVERAGE_GT_1 |
A leverage (= #) much greater than 1.0 is computed. It is set to 1.0. |
IMSLS_DEL_MSE_LT_0 |
A deleted residual mean square (= #) much less than 0 is computed. It is set to 0. |
Fatal Errors¶
IMSLS_NONNEG_WEIGHT_REQUEST_2 |
The weight for row # was #. Weights must be nonnegative. |