polyPrediction

Computes predicted values, confidence intervals, and diagnostics after fitting a polynomial regression model.

Synopsis

polyPrediction (polyInfo, x)

Required Arguments

structure polyInfo (Input)
A structure. See function polyRegression.
float x[] (Input)
Array of length nPredict containing the values of the independent variable for which calculations are to be performed.

Return Value

An array of length nPredict containing the predicted values.

Optional Arguments

confidence, float (Input)

Confidence level for both two-sided interval estimates on the mean and for two-sided prediction intervals in percent. Argument confidence must be in the range [0.0, 100.0). For one-sided intervals with confidence level onecl, where 50.0 ≤ onecl < 100.0, set confidence = 100.0 – 2.0 \* (100.0 − onecl).

Default: confidence = 95.0.

weights, float[] (Input)

Array of length nPredict containing the weight for each row of x. The computed prediction interval uses SSE/(DFE*weights[i]) for the estimated variance of a future response, where SSE is sum of squares error and DFE is degrees of freedom error.

Default: weights[] = 1.

scheffeCi, lowerLimit, upperLimit (Output)
Array lowerLimit is an array of length nPredict containing the lower confidence limits of Scheffé confidence intervals corresponding to the rows of x. Array upperLimit is an array of length nPredict containing the upper confidence limits of Scheffé confidence intervals corresponding to the rows of x.
pointwiseCiPopMean, lowerLimit, upperLimit (Output)
Array lowerLimit is an array of length nPredict containing the lower confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x. Array upperLimit is an array of length nPredict containing the upper confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x.
pointwiseCiNewSample, lowerLimit, upperLimit (Output)
Array lowerLimit is an array of length nPredict containing the lower confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows of x. Array upperLimit is an array of length nPredict containing the upper confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows of x.
leverage (Output)
An array of length nPredict containing the leverages.
y float[] (Input)
Array of length nPredict containing the observed responses.
y must be specified if any of the following optional arguments are specified.
residual (Output)
An array of length nPredict containing the residuals.
standardizedResidual (Output)
An array of length nPredict containing the standardized residuals.
deletedResidual (Output)
An array of length nPredict containing the deleted residuals.
cooksd (Output)
An array of length nPredict containing the Cook’s D statistics.
dffits (Output)
An array of length nPredict containing the DFFITS statistics, where DFFITS is the change in the predicted value of a point in the absence of the empirical value.

Description

Function polyPrediction assumes a polynomial model

\[y_i = \beta_0 + \beta_1 x_i + \ldots, \beta_k x_i^k + \varepsilon_i \phantom{...} i=1,2, \ldots n\]

where the observed values of the \(y_i\)’s constitute the response, the \(x_i\)’s are the settings of the independent variable, the \(\beta_j\)’s are the regression coefficients and the \(\varepsilon_i\)’s are the errors that are independently distributed normal with mean 0 and the following variance:

\[\frac{\sigma^2}{w_i}\]

Given the results of a polynomial regression, fitted using orthogonal polynomials and weights \(w_i\), function polyPrediction produces predicted values, residuals, confidence intervals, prediction intervals, and diagnostics for outliers and in influential cases.

Often, a predicted value and confidence interval are desired for a setting of the independent variable not used in computing the regression fit. This is accomplished by simply using a different x matrix when calling polyPrediction than was used for the fit (function polyRegression). See Example 1.

Results from function polyRegression, which produces the fit using orthogonal polynomials, are used for input by the structure polyInfo. The fitted model from polyRegression is

\[\hat{y}_i = \hat{\alpha}_0 p_0 \left(z_i\right) + \hat{\alpha}_1 p_1 \left(z_i\right) + \ldots + \hat{\alpha}_k p_k \left(z_i\right)\]

where the \(z_i\)’s are settings of the independent variable x scaled to the interval [−2, 2] and the \(p_j(z)\)’s are the orthogonal polynomials. The \(X^TX\) matrix for this model is a diagonal matrix with elements \(d_j\). The case statistics are easily computed from this model and are equal to those from the original polynomial model with \(\beta_j\)’s as the regression coefficients.

The leverage is computed as follows:

\[h_i = w_i \sum_{j=0}^{k} d_j^{-1} p_j^2 \left( z_i \right)\]

The estimated variance of

\[\hat{y}_i\]

is given by the following:

\[\frac{h_i s^2}{w_i}\]

The computation of the remainder of the case statistics follows easily from the definitions. See Diagnostics for Individual Cases for the definition of the case diagnostics.

Often, predicted values and confidence intervals are desired for combinations of settings of the independent variables not used in computing the regression fit. This can be accomplished by defining a new data matrix. Since the information about the model fit is input in polyInfo, it is not necessary to send in the data set used for the original calculation of the fit, i.e., only variable combinations for which predictions are desired need be entered in x.

Examples

Example 1

A polynomial model is fit to the data discussed by Neter and Wasserman (1974, pp. 279–285). The data set contains the response variable y measuring coffee sales (in hundred gallons) and the number of self-service dispensers. Responses for 14 similar cafeterias are in the data set.

from __future__ import print_function
from numpy import *
from pyimsl.stat.polyPrediction import polyPrediction
from pyimsl.stat.polyRegression import polyRegression
from pyimsl.stat.writeMatrix import writeMatrix


x = array([0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0,
           4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0])
y = array([508.1, 498.4, 568.2, 577.3, 651.7, 657.0, 755.3,
           758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4])
x2 = array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0])
degree = 2
poly_info = []
n_predict = 8

# Generate the polynomial regression fit
coefficients = polyRegression(x, y, degree, polyRegressionInfo=poly_info)

# Compute predicted values
y_hat = polyPrediction(poly_info[0], x2)

# Print predicted values
writeMatrix("Predicted Values", y_hat)

Output

 
                              Predicted Values
          1            2            3            4            5            6
      503.3        578.3        645.4        704.4        755.6        798.8
 
          7            8
      834.1        861.4

Example 2

Predicted values, confidence intervals, and diagnostics are computed for the data set described in the first example.

from __future__ import print_function
from numpy import *
from pyimsl.stat.polyPrediction import polyPrediction
from pyimsl.stat.polyRegression import polyRegression
from pyimsl.stat.writeMatrix import writeMatrix


x = array([0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0,
           4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0])
y = array([508.1, 498.4, 568.2, 577.3, 651.7, 657.0, 755.3,
           758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4])
degree = 2
poly_info = []
n_predict = 8

# Generate the polynomial regression fit
coefficients = polyRegression(x, y, degree, polyRegressionInfo=poly_info)

# Compute predicted values and case statistics
pointwise_ci_pop_mean = {}
pointwise_ci_new_sample = {}
s_residual = []
d_residual = []
leverage = []
cooksd = []
dffits = []
scheffe_ci = {}
y_hat = polyPrediction(poly_info[0], x,
                       pointwiseCiPopMean=pointwise_ci_pop_mean,
                       pointwiseCiNewSample=pointwise_ci_new_sample,
                       y=y,
                       standardizedResidual=s_residual,
                       deletedResidual=d_residual,
                       leverage=leverage,
                       cooksd=cooksd,
                       dffits=dffits,
                       scheffeCi=scheffe_ci)

# Print results
writeMatrix("Predicted Values", y_hat, writeFormat="%10.1f")
ssl = scheffe_ci["lowerLimit"]
writeMatrix("Lower Scheffe CI", ssl, writeFormat="%10.1f")
ssu = scheffe_ci["upperLimit"]
writeMatrix("Upper Scheffe CI", ssu, writeFormat="%10.1f")
pcil = pointwise_ci_pop_mean["lowerLimit"]
writeMatrix("Lower CI", pcil, writeFormat="%10.1f")
pciu = pointwise_ci_pop_mean["upperLimit"]
writeMatrix("Upper CI", pciu, writeFormat="%10.1f")
pil = pointwise_ci_new_sample["lowerLimit"]
writeMatrix("Lower PI", pil, writeFormat="%10.1f")
piu = pointwise_ci_new_sample["upperLimit"]
writeMatrix("Upper PI", piu, writeFormat="%10.1f")
writeMatrix("Standardized Residual", s_residual, writeFormat="%10.1f")
writeMatrix("Deleted Residual", d_residual, writeFormat="%10.1f")
writeMatrix("Leverage", leverage, writeFormat="%10.1f")
writeMatrix("Cooks Distance", cooksd, writeFormat="%10.1f")
writeMatrix("DFFITS", dffits, writeFormat="%10.1f")

Output

 
                           Predicted Values
         1           2           3           4           5           6
     503.3       503.3       578.3       578.3       645.4       645.4
 
         7           8           9          10          11          12
     755.6       755.6       798.8       798.8       834.1       834.1
 
        13          14
     861.4       861.4
 
                           Lower Scheffe CI
         1           2           3           4           5           6
     489.8       489.8       569.5       569.5       636.5       636.5
 
         7           8           9          10          11          12
     745.7       745.7       790.2       790.2       825.5       825.5
 
        13          14
     847.7       847.7
 
                           Upper Scheffe CI
         1           2           3           4           5           6
     516.9       516.9       587.1       587.1       654.2       654.2
 
         7           8           9          10          11          12
     765.5       765.5       807.4       807.4       842.7       842.7
 
        13          14
     875.1       875.1
 
                               Lower CI
         1           2           3           4           5           6
     492.8       492.8       571.4       571.4       638.4       638.4
 
         7           8           9          10          11          12
     747.9       747.9       792.1       792.1       827.4       827.4
 
        13          14
     850.7       850.7
 
                               Upper CI
         1           2           3           4           5           6
     513.9       513.9       585.2       585.2       652.3       652.3
 
         7           8           9          10          11          12
     763.3       763.3       805.5       805.5       840.8       840.8
 
        13          14
     872.1       872.1
 
                               Lower PI
         1           2           3           4           5           6
     482.7       482.7       559.3       559.3       626.3       626.3
 
         7           8           9          10          11          12
     736.3       736.3       779.9       779.9       815.2       815.2
 
        13          14
     840.7       840.7
 
                               Upper PI
         1           2           3           4           5           6
     524.0       524.0       597.3       597.3       664.4       664.4
 
         7           8           9          10          11          12
     774.9       774.9       817.7       817.7       853.0       853.0
 
        13          14
     882.1       882.1
 
                         Standardized Residual
         1           2           3           4           5           6
       0.7        -0.8        -1.4        -0.1         0.9         1.6
 
         7           8           9          10          11          12
      -0.0         0.5        -1.5        -0.9         1.0        -0.3
 
        13          14
      -1.0         1.6
 
                           Deleted Residual
         1           2           3           4           5           6
       0.7        -0.8        -1.4        -0.1         0.8         1.7
 
         7           8           9          10          11          12
      -0.0         0.4        -1.6        -0.9         1.0        -0.3
 
        13          14
      -1.1         1.7
 
                               Leverage
         1           2           3           4           5           6
       0.4         0.4         0.2         0.2         0.2         0.2
 
         7           8           9          10          11          12
       0.2         0.2         0.1         0.1         0.1         0.1
 
        13          14
       0.4         0.4
 
                            Cooks Distance
         1           2           3           4           5           6
       0.1         0.1         0.1         0.0         0.0         0.1
 
         7           8           9          10          11          12
       0.0         0.0         0.1         0.0         0.1         0.0
 
        13          14
       0.2         0.5
 
                                DFFITS
         1           2           3           4           5           6
       0.5        -0.6        -0.6        -0.1         0.4         0.7
 
         7           8           9          10          11          12
      -0.0         0.2        -0.7        -0.4         0.4        -0.1
 
        13          14
      -0.8         1.3

Warning Errors

IMSLS_LEVERAGE_GT_1 A leverage (= #) much greater than one is computed. It is set to 1.0.
IMSLS_DEL_MSE_LT_0 A deleted residual mean square (= #) much less than zero is computed. It is set to zero.

Fatal Errors

IMSLS_NEG_WEIGHT “weights[#]” = #. Weights must be nonnegative.