regressionPrediction

Computes predicted values, confidence intervals, and diagnostics after fitting a regression model.

Synopsis

regressionPrediction (regressionInfo, x)

Required Argument

structure regressionInfo (Input)
A structure containing information about the regression fit. See regression.
float x[] (Input)
Array of size nPredict by the number of independent variables containing the combinations of independent variables in each row for which calculations are to be performed.

Return Value

An array of length nPredict containing the predicted values.

Optional Arguments

indexRegression, int (Input)

Given a multivariate regression fit, this option allows the user to specify for which regression statistics will be computed.

Default: indexRegression = 0

xIndices, int indind[], int inddep, int ifrq, int iwt (Input)

This argument allows an alternative method for data specification. Data (independent, dependent, frequencies, and weights) is all stored in the data matrix x. Argument y, and keyword weights are ignored.

Each of the four arguments contains indices indicating column numbers of x in which particular types of data are stored.

Parameter indind contains the indices of the independent variables.

Parameter inddep contains the indices of the dependent variables. If there is to be no dependent variable, this must be indicated by setting the first element of the vector to −1.

Parameters ifrq and iwt contain the column numbers of x in which the frequencies and weights, respectively, are stored. Set ifrq = −1 if there will be no column for frequencies. Set iwt = −1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.

Note that frequencies are not referenced by function regressionPrediction, and is included here only for the sake of keyword consistency.

Finally, note that xIndices and y are mutually exclusive keywords, and may not be specified in the same call to regressionPrediction.

weights, float[] (Input)

Array of length nPredict containing the weight for each row of x. The computed prediction interval uses SSE/(DFE*weights[i]) for the estimated variance of a future response, where SSE is sum of squares error and DFE is degrees of freedom error.

Default: weights[] = 1

confidence, float (Input)

Confidence level for both two-sided interval estimates on the mean and for two-sided prediction intervals, in percent. Argument confidence must be in the range [0.0, 100.0). For one-sided intervals with confidence level onecl, where 50.0 ≤ onecl < 100.0, set confidence = 100.0 − 2.0 * (100.0 − onecl).

Default: confidence = 95.0

scheffeCi, lowerLimit, upperLimit (Output)
Array lowerLimit is an array of length nPredict containing the lower confidence limits of Scheffé confidence intervals corresponding to the rows of x. Array upperLimit is an array of length nPredict containing the upper confidence limits of Scheffé confidence intervals corresponding to the rows of x.
pointwiseCiPopMean, lowerLimit, upperLimit (Output)
Array lowerLimit is an array of length nPredict containing the lower-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x. Array upperLimit is an array of length nPredict containing the upper-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x.
pointwiseCiNewSample, lowerLimit, upperLimit (Output)
Array lowerLimit is an array of length nPredict containing the lower-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows of x. Array upperLimit is an array of length nPredict containing the upper-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows of x.
leverage (Output)
An array of length nPredict containing the leverages.
y, float[] (Input)
Array of length nPredict containing the observed responses.
y (or xIndices) must be specified if any of the following optional arguments are specified.
residual (Output)
An array of length nPredict containing the residuals.
standardizedResidual (Output)
An array of length nPredict containing the standardized residuals.
deletedResidual (Output)
An array of length nPredict containing the deleted residuals.
cooksd (Output)
An array of length nPredict containing the Cook’s D statistics.
dffits (Output)
An array of length nPredict containing the DFFITS statistics.

Description

The general linear model used by function regressionPrediction is

\[y = Xβ + ɛ\]

where y is the n × 1 vector of responses, X is the n × p matrix of regressors, β is the p × 1 vector of regression coefficients, and ɛ is the n × 1 vector of errors whose elements are independently normally distributed with mean 0 and the variance below.

\[\frac{\sigma^2}{w_i}\]

From a general linear model fit using the \(w_i\)’s as the weights, function regressionPrediction computes confidence intervals and statistics for the individual cases that constitute the data set. Let \(x_i\) be a column vector containing elements of the i‑th row of X. The leverage is defined by

\[h_i = \left[x_i^T \left(X^T WX\right)^- x_i\right] w_i\]

where \(W=\mathrm{diag}(w_1,w_2,\ldots,w_n)\) and \((X^T WX)^-\) denotes a generalized inverse of \(X^T WX\).

Put \(D=\mathrm{diag}(d_1,d_2,\ldots,d_n)\) with \(d_j=1\) if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as \(h_i=(a^T Da) w_i\) where a is a solution to \(R^T a=x_i\). The estimated variance of

\[\hat{y} = x_i^T \hat{B}\]

is given by the following:

\[\frac{h_i s^2}{w_i}\]

where

\[s^2 = \frac{\mathrm{SSE}}{\mathrm{DFE}}\]

The computation of the remainder of the case statistics follow easily from their definitions. See Diagnostics for Individual Cases for the definition of the case diagnostics.

Informational errors can occur if the input matrix x is not consistent with the information from the fit (contained in regressionInfo), or if excess rounding has occurred. The warning error IMSLS_NONESTIMABLE arises when x contains a row not in the space spanned by the rows of R. An examination of the model that was fitted and the x for which diagnostics are to be computed is required in order to ensure that only linear combinations of the regression coefficients that can be estimated from the fitted model are specified in x. For further details, see the discussion of estimable functions given in Maindonald (1984, pp. 166−168) and Searle (1971, pp. 180−188).

Often predicted values and confidence intervals are desired for combinations of settings of the independent variables not used in computing the regression fit. This can be accomplished by defining a new data matrix. Since the information about the model fit is input in regressionInfo, it is not necessary to send in the data set used for the original calculation of the fit, i.e., only variable combinations for which predictions are desired need be entered in x.

Examples

Example 1

from numpy import *
from pyimsl.stat.regression import regression
from pyimsl.stat.regressionPrediction import regressionPrediction
from pyimsl.stat.writeMatrix import writeMatrix

x = array([
    [7.0, 26.0, 6.0, 60.0],
    [1.0, 29.0, 15.0, 52.0],
    [11.0, 56.0, 8.0, 20.0],
    [11.0, 31.0, 8.0, 47.0],
    [7.0, 52.0, 6.0, 33.0],
    [11.0, 55.0, 9.0, 22.0],
    [3.0, 71.0, 17.0, 6.0],
    [1.0, 31.0, 22.0, 44.0],
    [2.0, 54.0, 18.0, 22.0],
    [21.0, 47.0, 4.0, 26.0],
    [1.0, 40.0, 23.0, 34.0],
    [11.0, 66.0, 9.0, 12.0],
    [10.0, 68.0, 8.0, 12.0]])
y = array([78.5, 74.3, 104.3, 87.6, 95.9, 109.2,
           102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4])
regression_info = []

# Fit the regression model
coefficients = regression(x, y, regressionInfo=regression_info)
regression_info = regression_info[0]

# Generate case statistics
y_hat = regressionPrediction(regression_info, x)

# Print results
writeMatrix("Predicted Responses", y_hat)

Output

 
                             Predicted Responses
          1            2            3            4            5            6
       78.5         72.8        106.0         89.3         95.6        105.3
 
          7            8            9           10           11           12
      104.1         75.7         91.7        115.6         81.8        112.3
 
         13
      111.7

Example 2

from numpy import *
from pyimsl.stat.regression import regression
from pyimsl.stat.regressionPrediction import regressionPrediction
from pyimsl.stat.writeMatrix import writeMatrix

x = array([
    [7.0, 26.0, 6.0, 60.0],
    [1.0, 29.0, 15.0, 52.0],
    [11.0, 56.0, 8.0, 20.0],
    [11.0, 31.0, 8.0, 47.0],
    [7.0, 52.0, 6.0, 33.0],
    [11.0, 55.0, 9.0, 22.0],
    [3.0, 71.0, 17.0, 6.0],
    [1.0, 31.0, 22.0, 44.0],
    [2.0, 54.0, 18.0, 22.0],
    [21.0, 47.0, 4.0, 26.0],
    [1.0, 40.0, 23.0, 34.0],
    [11.0, 66.0, 9.0, 12.0],
    [10.0, 68.0, 8.0, 12.0]])
y = array([78.5, 74.3, 104.3, 87.6, 95.9, 109.2,
           102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4])
regression_info = []

# Fit the regression model
coefficients = regression(x, y, regressionInfo=regression_info)
regression_info = regression_info[0]

# Generate the case statistics
leverage = []
residual = []
standardized_residual = []
deleted_residual = []
cooksd = []
dffits = []
pointwise_ci_pop_mean = {}
pointwise_ci_new_sample = {}
scheffe_ci = {}

y_hat = regressionPrediction(regression_info, x,
                             y=y, leverage=leverage,
                             residual=residual,
                             standardizedResidual=standardized_residual,
                             deletedResidual=deleted_residual,
                             cooksd=cooksd, dffits=dffits,
                             pointwiseCiPopMean=pointwise_ci_pop_mean,
                             pointwiseCiNewSample=pointwise_ci_new_sample,
                             scheffeCi=scheffe_ci)

# Print results
writeMatrix("Predicted Responses", y_hat)
writeMatrix("Residuals", residual)
writeMatrix("Standardized Residuals", standardized_residual)
writeMatrix("Leverages", leverage)
writeMatrix("Deleted Residuals", deleted_residual)
writeMatrix("Cooks D", cooksd)
writeMatrix("DFFITS", dffits)
writeMatrix("Scheffe Lower Limit",
            scheffe_ci["lowerLimit"])
writeMatrix("Scheffe Upper Limit",
            scheffe_ci["upperLimit"])
writeMatrix("Population Mean Lower Limit",
            pointwise_ci_pop_mean["lowerLimit"])
writeMatrix("Population Mean Upper Limit",
            pointwise_ci_pop_mean["upperLimit"])
writeMatrix("New Sample Lower Limit",
            pointwise_ci_new_sample["lowerLimit"])
writeMatrix("New Sample Upper Limit",
            pointwise_ci_new_sample["upperLimit"])

Output

 
                             Predicted Responses
          1            2            3            4            5            6
       78.5         72.8        106.0         89.3         95.6        105.3
 
          7            8            9           10           11           12
      104.1         75.7         91.7        115.6         81.8        112.3
 
         13
      111.7
 
                                  Residuals
          1            2            3            4            5            6
      0.005        1.511       -1.671       -1.727        0.251        3.925
 
          7            8            9           10           11           12
     -1.449       -3.175        1.378        0.282        1.991        0.973
 
         13
     -2.294
 
                           Standardized Residuals
          1            2            3            4            5            6
      0.003        0.757       -1.050       -0.841        0.128        1.715
 
          7            8            9           10           11           12
     -0.744       -1.688        0.671        0.210        1.074        0.463
 
         13
     -1.124
 
                                  Leverages
          1            2            3            4            5            6
     0.5503       0.3332       0.5769       0.2952       0.3576       0.1242
 
          7            8            9           10           11           12
     0.3671       0.4085       0.2943       0.7004       0.4255       0.2630
 
         13
     0.3037
 
                              Deleted Residuals
          1            2            3            4            5            6
      0.003        0.735       -1.058       -0.824        0.120        2.017
 
          7            8            9           10           11           12
     -0.722       -1.967        0.646        0.197        1.086        0.439
 
         13
     -1.146
 
                                   Cooks D
          1            2            3            4            5            6
     0.0000       0.0572       0.3009       0.0593       0.0018       0.0834
 
          7            8            9           10           11           12
     0.0643       0.3935       0.0375       0.0207       0.1708       0.0153
 
         13
     0.1102
 
                                   DFFITS
          1            2            3            4            5            6
      0.003        0.519       -1.236       -0.533        0.089        0.759
 
          7            8            9           10           11           12
     -0.550       -1.635        0.417        0.302        0.935        0.262
 
         13
     -0.757
 
                             Scheffe Lower Limit
          1            2            3            4            5            6
       70.7         66.7         98.0         83.6         89.4        101.6
 
          7            8            9           10           11           12
       97.8         69.0         86.0        106.8         75.0        106.9
 
         13
      105.9
 
                             Scheffe Upper Limit
          1            2            3            4            5            6
       86.3         78.9        113.9         95.0        101.9        109.0
 
          7            8            9           10           11           12
      110.5         82.4         97.4        124.4         88.7        117.7
 
         13
      117.5
 
                         Population Mean Lower Limit
          1            2            3            4            5            6
       74.3         69.5        101.7         86.3         92.3        103.3
 
          7            8            9           10           11           12
      100.7         72.1         88.7        110.9         78.1        109.4
 
         13
      108.6
 
                         Population Mean Upper Limit
          1            2            3            4            5            6
       82.7         76.0        110.3         92.4         99.0        107.3
 
          7            8            9           10           11           12
      107.6         79.3         94.8        120.3         85.5        115.2
 
         13
      114.8
 
                           New Sample Lower Limit
          1            2            3            4            5            6
       71.5         66.3         98.9         82.9         89.1         99.3
 
          7            8            9           10           11           12
       97.6         69.0         85.3        108.3         75.1        106.0
 
         13
      105.3
 
                           New Sample Upper Limit
          1            2            3            4            5            6
       85.5         79.3        113.1         95.7        102.2        111.3
 
          7            8            9           10           11           12
      110.7         82.4         98.1        123.0         88.5        118.7
 
         13
      118.1

Warning Errors

IMSLS_NONESTIMABLE Within the preset tolerance, the linear combination of regression coefficients is nonestimable.
IMSLS_LEVERAGE_GT_1 A leverage (= #) much greater than 1.0 is computed. It is set to 1.0.
IMSLS_DEL_MSE_LT_0 A deleted residual mean square (= #) much less than 0 is computed. It is set to 0.

Fatal Errors

IMSLS_NONNEG_WEIGHT_REQUEST_2 The weight for row # was #. Weights must be nonnegative.