regression_prediction

Computes predicted values, confidence intervals, and diagnostics after fitting a regression model.

Synopsis

#include <imsls.h>

float *imsls_f_regression_prediction (Imsls_f_regression *regression_info, int n_predict, float x[], ..., 0)

The type double function is imsls_d_regression_prediction.

Required Argument

Imsls_f_regression *regression_info (Input)
Pointer to a structure of type Imsls_f_regression containing information about the regression fit. See imsls_f_regression.

int n_predict (Input)
Number of rows in x.

float x[] (Input)
Array of size n_predict by the number of independent variables containing the combinations of independent variables in each row for which calculations are to be performed.

Return Value

Pointer to an internally allocated array of length n_predict containing the predicted values.

Synopsis with Optional Arguments

#include <imsls.h>

float *imsls_f_regression_prediction (Imsls_f_regression *regression_info, int n_predict, float x[],

IMSLS_X_COL_DIM, int x_col_dim,

IMSLS_Y_COL_DIM, int y_col_dim,

IMSLS_INDEX_REGRESSION, int idep,

IMSLS_X_INDICES, int indind[], int inddep[], int ifrq, int iwt,

IMSLS_WEIGHTS, float weights[],

IMSLS_CONFIDENCE, float confidence,

IMSLS_SCHEFFE_CI, float **lower_limit, float **upper_limit,

IMSLS_SCHEFFE_CI_USER, float lower_limit[], float upper_limit[],

IMSLS_POINTWISE_CI_POP_MEAN, float **lower_limit, float **upper_limit,

IMSLS_POINTWISE_CI_POP_MEAN_USER, float lower_limit[], float upper_limit[],

IMSLS_POINTWISE_CI_NEW_SAMPLE, float **lower_limit, float **upper_limit,

IMSLS_POINTWISE_CI_NEW_SAMPLE_USER, float lower_limit[], float upper_limit[],

IMSLS_LEVERAGE, float **leverage,

IMSLS_LEVERAGE_USER, float leverage[],

IMSLS_RETURN_USER, float y_hat[],

IMSLS_Y, float y[],

IMSLS_RESIDUAL, float **residual,

IMSLS_RESIDUAL_USER, float residual[],

IMSLS_STANDARDIZED_RESIDUAL, float **standardized_residual,

IMSLS_STANDARDIZED_RESIDUAL_USER, float standardized_residual[],

IMSLS_DELETED_RESIDUAL, float **deleted_residual,

IMSLS_DELETED_RESIDUAL_USER, float deleted_residual[],

IMSLS_COOKSD, float **cooksd,

IMSLS_COOKSD_USER, float cooksd[],

IMSLS_DFFITS, float **dffits,

IMSLS_DFFITS_USER, float dffits[],

0)

Optional Arguments

IMSLS_X_COL_DIM, int x_col_dim (Input)
Number of columns in x.
Default: x_col_dim is equal to the number of independent variables, which is input from the structure regression_info

IMSLS_Y_COL_DIM, int y_col_dim (Input)
Number of columns in y.
Default: y_col_dim = 1

IMSLS_INDEX_REGRESSION, int idep (Input)
Given a multivariate regression fit, this option allows the user to specify for which regression statistics will be computed.
Default: idep = 0

IMSLS_X_INDICES, int indind[], int inddep, int ifrq, int iwt (Input)
This argument allows an alternative method for data specification. Data (independent, dependent, frequencies, and weights) is all stored in the data matrix x. Argument y, and keyword IMSLS_WEIGHTS are ignored.

Each of the four arguments contains indices indicating column numbers of x in which particular types of data are stored. Columns are numbered 0, , x_col_dim  1.

Parameter indind contains the indices of the independent variables.

Parameter inddep contains the indices of the dependent variables. If there is to be no dependent variable, this must be indicated by setting the first element of the vector to 1.

Parameters ifrq and iwt contain the column numbers of x in which the frequencies and weights, respectively, are stored. Set ifrq = 1 if there will be no column for frequencies. Set iwt = 1 if there will be no column for weights. Weights are rounded to the nearest integer. Negative weights are not allowed.

Note that frequencies are not referenced by function regression_prediction, and is included here only for the sake of keyword consistency.

Finally, note that IMSLS_X_INDICES and IMSLS_Y are mutually exclusive keywords, and may not be specified in the same call to regression_prediction.

IMSLS_WEIGHTS, float weights[] (Input)
Array of length n_predict containing the weight for each row of x. The computed prediction interval uses SSE/(DFE*weights[i]) for the estimated variance of a future response, where SSE is sum of squares error and DFE is degrees of freedom error.
Default: weights[] = 1

IMSLS_CONFIDENCE, float confidence (Input)
Confidence level for both two-sided interval estimates on the mean and for two-sided prediction intervals, in percent. Argument confidence must be in the range [0.0, 100.0). For one-sided intervals with confidence level onecl, where 50.0  onecl < 100.0, set confidence = 100.0  2.0* (100.0  onecl).
Default: confidence = 95.0

IMSLS_SCHEFFE_CI, float **lower_limit, float **upper_limit (Output)
Array lower_limit is the address of a pointer to an internally allocated array of length n_predict containing the lower confidence limits of Scheffé confidence intervals corresponding to the rows of x. Array upper_limit is the address of a pointer to an internally allocated array of length n_predict containing the upper confidence limits of Scheffé confidence intervals corresponding to the rows of x.

IMSLS_SCHEFFE_CI_USER, float lower_limit[], float upper_limit[] (Output)
Storage for arrays lower_limit and upper_limit is provided by the user. See IMSLS_SCHEFFE_CI.

IMSLS_POINTWISE_CI_POP_MEAN, float **lower_limit, float **upper_limit (Output)
Array lower_limit is the address of a pointer to an internally allocated array of length n_predict containing the lower-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x. Array upper_limit is the address of a pointer to an internally allocated array of length n_predict containing the upper-confidence limits of the confidence intervals for two-sided interval estimates of the means, corresponding to the rows of x.

IMSLS_POINTWISE_CI_POP_MEAN_USER, float lower_limit[], float upper_limit[] (Output)
Storage for arrays lower_limit and upper_limit is provided by the user. See IMSLS_POINTWISE_CI_POP_MEAN.

IMSLS_POINTWISE_CI_NEW_SAMPLE, float **lower_limit, float **upper_limit (Output)
Array lower_limit is the address of a pointer to an internally allocated array of length n_predict containing the lower-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows of x. Array upper_limit is the address of a pointer to an internally allocated array of length n_predict containing the upper-confidence limits of the confidence intervals for two-sided prediction intervals, corresponding to the rows of x.

IMSLS_POINTWISE_CI_NEW_SAMPLE_USER, float lower_limit[], float upper_limit[] (Output)
Storage for arrays lower_limit and upper_limit is provided by the user. See IMSLS_POINTWISE_CI_NEW_SAMPLE.

IMSLS_LEVERAGE, float **leverage (Output)
Address of a pointer to an internally allocated array of length n_predict containing the leverages.

IMSLS_LEVERAGE_USER, float leverage[] (Output)
Storage for array leverage is provided by the user. See IMSLS_LEVERAGE.

IMSLS_RETURN_USER, float y_hat[] (Output)
Storage for array y_hat is provided by the user. The length n_predict array contains the predicted values.

IMSLS_Y, float y[] (Input)
Array of length n_predict containing the observed responses.

Note:  IMSLS_Y (or IMSLS_X_INDICES) must be specified if any of the following optional arguments are specified.

IMSLS_RESIDUAL, float **residual (Output)
Address of a pointer to an internally allocated array of length n_predict containing the residuals.

IMSLS_RESIDUAL_USER, float residual[] (Output)
Storage for array residual is provided by the user. See IMSLS_RESIDUAL.

IMSLS_STANDARDIZED_RESIDUAL, float **standardized_residual (Output)
Address of a pointer to an internally allocated array of length n_predict containing the standardized residuals.

IMSLS_STANDARDIZED_RESIDUAL_USER, float standardized_residual[] (Output)
Storage for array standardized_residual is provided by the user. See IMSLS_STANDARDIZED_RESIDUAL.

IMSLS_DELETED_RESIDUAL, float **deleted_residual (Output)
Address of a pointer to an internally allocated array of length n_predict containing the deleted residuals.

IMSLS_DELETED_RESIDUAL_USER, float deleted_residual[] (Output)
Storage for array deleted_residual is provided by the user. See IMSLS_DELETED_RESIDUAL.

IMSLS_COOKSD, float **cooksd (Output)
Address of a pointer to an internally allocated array of length n_predict containing the Cook’s D statistics.

IMSLS_COOKSD_USER, float cooksd[] (Output)
Storage for array cooksd is provided by the user. See IMSLS_COOKSD.

IMSLS_DFFITS, float **dffits (Output)
Address of a pointer to an internally allocated array of length n_predict containing the DFFITS statistics.

IMSLS_DFFITS_USER, float dffits[] (Output)
Storage for array dffits is provided by the user. See IMSLS_DFFITS.

Description

The general linear model used by function imsls_f_regression_prediction is

Xβ + ɛ

where y is the n × 1 vector of responses, X is the n × p matrix of regressors, β is the p × 1 vector of regression coefficients, and ɛ is the n × 1 vector of errors whose elements are independently normally distributed with mean 0 and the variance below.

 

From a general linear model fit using the wi’s as the weights, function imsls_f_regression_prediction computes confidence intervals and statistics for the individual cases that constitute the data set. Let xi be a column vector containing elements of the ith row of X. The leverage is defined by

 

where W = diag(w1, w2, , wn) and (XTWX) denotes a generalized inverse of XTWX.

Put = diag (d1d2dn) with dj = 1 if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as hi = (aTDa) wi where a is a solution to RTxi. The estimated variance of

 

is given by the following:

 

where

 

The computation of the remainder of the case statistics follow easily from their definitions. See Diagnostics for Individual Cases for the definition of the case diagnostics.

Informational errors can occur if the input matrix x is not consistent with the information from the fit (contained in regression_info), or if excess rounding has occurred. The warning error IMSLS_NONESTIMABLE arises when x contains a row not in the space spanned by the rows of R. An examination of the model that was fitted and the x for which diagnostics are to be computed is required in order to ensure that only linear combinations of the regression coefficients that can be estimated from the fitted model are specified in x. For further details, see the discussion of estimable functions given in Maindonald (1984, pp. 166168) and Searle (1971, pp. 180188).

Often predicted values and confidence intervals are desired for combinations of settings of the independent variables not used in computing the regression fit. This can be accomplished by defining a new data matrix. Since the information about the model fit is input in regression_info, it is not necessary to send in the data set used for the original calculation of the fit, i.e., only variable combinations for which predictions are desired need be entered in x.

Examples

Example 1

 

#include <imsls.h>

 

int main()

{

#define INTERCEPT 1

#define N_INDEPENDENT 4

#define N_OBSERVATIONS 13

#define N_COEFFICIENTS (INTERCEPT + N_INDEPENDENT)

#define N_DEPENDENT 1

 

float *y_hat, *coefficients;

Imsls_f_regression *regression_info;

float x[][N_INDEPENDENT] = {

7.0, 26.0, 6.0, 60.0,

1.0, 29.0, 15.0, 52.0,

11.0, 56.0, 8.0, 20.0,

11.0, 31.0, 8.0, 47.0,

7.0, 52.0, 6.0, 33.0,

11.0, 55.0, 9.0, 22.0,

3.0, 71.0, 17.0, 6.0,

1.0, 31.0, 22.0, 44.0,

2.0, 54.0, 18.0, 22.0,

21.0, 47.0, 4.0, 26.0,

1.0, 40.0, 23.0, 34.0,

11.0, 66.0, 9.0, 12.0,

10.0, 68.0, 8.0, 12.0};

float y[] = {78.5, 74.3, 104.3, 87.6, 95.9, 109.2,

102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4};

 

/* Fit the regression model */

coefficients = imsls_f_regression(N_OBSERVATIONS, N_INDEPENDENT,

(float *)x, y,

IMSLS_REGRESSION_INFO, &regression_info,

0);

 

/* Generate case statistics */

y_hat = imsls_f_regression_prediction(regression_info,

N_OBSERVATIONS, (float*)x, 0);

 

/* Print results */

imsls_f_write_matrix("Predicted Responses", 1, N_OBSERVATIONS,

y_hat, 0);

}

Output

 

Predicted Responses

1 2 3 4 5 6

78.5 72.8 106.0 89.3 95.6 105.3

 

7 8 9 10 11 12

104.1 75.7 91.7 115.6 81.8 112.3

 

13

111.7

Example 2

 

#include <imsls.h>

 

int main()

{

#define INTERCEPT 1

#define N_INDEPENDENT 4

#define N_OBSERVATIONS 13

#define N_COEFFICIENTS (INTERCEPT + N_INDEPENDENT)

#define N_DEPENDENT 1

 

float *y_hat, *leverage, *residual, *standardized_residual,

*deleted_residual, *dffits, *cooksd, *mean_lower_limit,

*mean_upper_limit, *new_sample_lower_limit,

*new_sample_upper_limit, *scheffe_lower_limit,

*scheffe_upper_limit, *coefficients;

Imsls_f_regression *regression_info;

float x[][N_INDEPENDENT] = {

7.0, 26.0, 6.0, 60.0,

1.0, 29.0, 15.0, 52.0,

11.0, 56.0, 8.0, 20.0,

11.0, 31.0, 8.0, 47.0,

7.0, 52.0, 6.0, 33.0,

11.0, 55.0, 9.0, 22.0,

3.0, 71.0, 17.0, 6.0,

1.0, 31.0, 22.0, 44.0,

2.0, 54.0, 18.0, 22.0,

21.0, 47.0, 4.0, 26.0,

1.0, 40.0, 23.0, 34.0,

11.0, 66.0, 9.0, 12.0,

10.0, 68.0, 8.0, 12.0};

float y[] = {78.5, 74.3, 104.3, 87.6, 95.9, 109.2,

102.7, 72.5, 93.1, 115.9, 83.8, 113.3, 109.4};

 

/* Fit the regression model */

coefficients = imsls_f_regression(N_OBSERVATIONS, N_INDEPENDENT,

(float *)x, y,

IMSLS_REGRESSION_INFO, &regression_info,

0);

 

/* Generate the case statistics */

y_hat = imsls_f_regression_prediction(regression_info,

N_OBSERVATIONS, (float*)x,

IMSLS_Y, y,

IMSLS_LEVERAGE, &leverage,

IMSLS_RESIDUAL, &residual,

IMSLS_STANDARDIZED_RESIDUAL, &standardized_residual,

IMSLS_DELETED_RESIDUAL, &deleted_residual,

IMSLS_COOKSD, &cooksd,

IMSLS_DFFITS, &dffits,

IMSLS_POINTWISE_CI_POP_MEAN, &mean_lower_limit,

&mean_upper_limit,

IMSLS_POINTWISE_CI_NEW_SAMPLE, &new_sample_lower_limit,

&new_sample_upper_limit,

IMSLS_SCHEFFE_CI, &scheffe_lower_limit,

&scheffe_upper_limit,

0);

 

/* Print results */

imsls_f_write_matrix("Predicted Responses", 1, N_OBSERVATIONS,

y_hat, 0);

imsls_f_write_matrix("Residuals", 1, N_OBSERVATIONS, residual, 0);

imsls_f_write_matrix("Standardized Residuals", 1, N_OBSERVATIONS,

standardized_residual, 0);

imsls_f_write_matrix("Leverages", 1, N_OBSERVATIONS, leverage, 0);

imsls_f_write_matrix("Deleted Residuals", 1, N_OBSERVATIONS,

deleted_residual, 0);

imsls_f_write_matrix("Cooks D", 1, N_OBSERVATIONS, cooksd, 0);

imsls_f_write_matrix("DFFITS", 1, N_OBSERVATIONS, dffits, 0);

imsls_f_write_matrix("Scheffe Lower Limit", 1, N_OBSERVATIONS,

scheffe_lower_limit, 0);

imsls_f_write_matrix("Scheffe Upper Limit", 1, N_OBSERVATIONS,

scheffe_upper_limit, 0);

imsls_f_write_matrix("Population Mean Lower Limit", 1,

N_OBSERVATIONS, mean_lower_limit, 0);

imsls_f_write_matrix("Population Mean Upper Limit", 1,

N_OBSERVATIONS, mean_upper_limit, 0);

imsls_f_write_matrix("New Sample Lower Limit", 1, N_OBSERVATIONS,

new_sample_lower_limit, 0);

imsls_f_write_matrix("New Sample Upper Limit", 1, N_OBSERVATIONS,

new_sample_upper_limit, 0);

}

Output

 

Predicted Responses

1 2 3 4 5 6

78.5 72.8 106.0 89.3 95.6 105.3

 

7 8 9 10 11 12

104.1 75.7 91.7 115.6 81.8 112.3

 

13

111.7

 

Residuals

1 2 3 4 5 6

0.005 1.511 -1.671 -1.727 0.251 3.925

 

7 8 9 10 11 12

-1.449 -3.175 1.378 0.282 1.991 0.973

 

13

-2.294

 

Standardized Residuals

1 2 3 4 5 6

0.003 0.757 -1.050 -0.841 0.128 1.715

 

7 8 9 10 11 12

-0.744 -1.688 0.671 0.210 1.074 0.463

 

13

-1.124

 

Leverages

1 2 3 4 5 6

0.5503 0.3332 0.5769 0.2952 0.3576 0.1242

 

7 8 9 10 11 12

0.3671 0.4085 0.2943 0.7004 0.4255 0.2630

 

13

0.3037

 

Deleted Residuals

1 2 3 4 5 6

0.003 0.735 -1.058 -0.824 0.120 2.017

 

7 8 9 10 11 12

-0.722 -1.967 0.646 0.197 1.086 0.439

 

13

-1.146

 

Cooks D

1 2 3 4 5 6

0.0000 0.0572 0.3009 0.0593 0.0018 0.0834

 

7 8 9 10 11 12

0.0643 0.3935 0.0375 0.0207 0.1708 0.0153

 

13

0.1102

 

DFFITS

1 2 3 4 5 6

0.003 0.519 -1.236 -0.533 0.089 0.759

 

7 8 9 10 11 12

-0.550 -1.635 0.417 0.302 0.935 0.262

 

13

-0.757

Scheffe Lower Limit

1 2 3 4 5 6

70.7 66.7 98.0 83.6 89.4 101.6

 

7 8 9 10 11 12

97.8 69.0 86.0 106.8 75.0 106.9

 

13

105.9

 

Scheffe Upper Limit

1 2 3 4 5 6

86.3 78.9 113.9 95.0 101.9 109.0

 

7 8 9 10 11 12

110.5 82.4 97.4 124.4 88.7 117.7

 

13

117.5

 

Population Mean Lower Limit

1 2 3 4 5 6

74.3 69.5 101.7 86.3 92.3 103.3

 

7 8 9 10 11 12

100.7 72.1 88.7 110.9 78.1 109.4

 

13

108.6

 

Population Mean Upper Limit

1 2 3 4 5 6

82.7 76.0 110.3 92.4 99.0 107.3

 

7 8 9 10 11 12

107.6 79.3 94.8 120.3 85.5 115.2

 

13

114.8

 

New Sample Lower Limit

1 2 3 4 5 6

71.5 66.3 98.9 82.9 89.1 99.3

 

7 8 9 10 11 12

97.6 69.0 85.3 108.3 75.1 106.0

 

13

105.3

 

New Sample Upper Limit

1 2 3 4 5 6

85.5 79.3 113.1 95.7 102.2 111.3

 

7 8 9 10 11 12

110.7 82.4 98.1 123.0 88.5 118.7

 

13

118.1

Warning Errors

IMSLS_NONESTIMABLE

Within the preset tolerance, the linear combination of regression coefficients is nonestimable.

IMSLS_LEVERAGE_GT_1

A leverage (= #) much greater than 1.0 is computed. It is set to 1.0.

IMSLS_DEL_MSE_LT_0

A deleted residual mean square (= #) much less than 0 is computed. It is set to 0.

Fatal Errors

IMSLS_NONNEG_WEIGHT_REQUEST_2

The weight for row # was #. Weights must be nonnegative.