poly_regression

CNLMath : Statistics and Random Number Generation : poly_regression

poly_regression

Synopsis

Required Arguments

Return Value

Synopsis with Optional Arguments

Performs a polynomial least-squares regression.

Synopsis

#include <imsl.h>

float *imsl_f_poly_regression (int n_observations, float x[], float y[], int degree, …, 0)

The type double procedure is imsl_d_poly_regression.

Required Arguments

int n_observations (Input)
The number of observations.

float x[] (Input)
Array of length n_observations containing the independent variable.

float y[] (Input)
Array of length n_observations containing the dependent variable.

int degree (Input)
The degree of the polynomial.

Return Value

A pointer to the vector of size degree +1 containing the coefficients of the fitted polynomial. If a fit cannot be computed, then NULL is returned.

Synopsis with Optional Arguments

#include <imsl.h>

float *imsl_f_poly_regression (int n_observations, float xdata[], float ydata[], int degree,

IMSL_WEIGHTS, float weights[],

IMSL_SSQ_POLY, float **p_ssq_poly,

IMSL_SSQ_POLY_USER, float ssq_poly[],

IMSL_SSQ_POLY_COL_DIM, int ssq_poly_col_dim,

IMSL_SSQ_LOF, float **p_ssq_lof,

IMSL_SSQ_LOF_USER, float ssq_lof[],

IMSL_SSQ_LOF_COL_DIM, int ssq_lof_col_dim,

IMSL_X_MEAN, float *x_mean,

IMSL_X_VARIANCE, float *x_variance,

IMSL_ANOVA_TABLE, float **p_anova_table,

IMSL_ANOVA_TABLE_USER, float anova_table[],

IMSL_DF_PURE_ERROR, int *df_pure_error,

IMSL_SSQ_PURE_ERROR, float *ssq_pure_error,

IMSL_RESIDUAL, float **p_residual,

IMSL_RESIDUAL_USER, float residual[],

IMSL_RETURN_USER, float coefficients[],

Optional Arguments

IMSL_WEIGHTS, float weights[] (Input)
Array with n_observations components containing the vector of weights for the observation. If this option is not specified, all observations have equal weights of one.

IMSL_SSQ_POLY, float **p_ssq_poly (Output)
The address of a pointer to the array containing the sequential sums of squares and other statistics. On return, the pointer is initialized (through a memory allocation request to malloc), and the array is stored there. Typically, float *p_ssq_poly is declared; &p_ssq_poly is used as an argument to this function; and imsl_free(p_ssq_poly) is used to free this array. Row i corresponds to xi, i = 1, …, degree, and the columns are described as follows:

Column	Description
1	degrees of freedom
2	sums of squares
3	F-statistic
4	p-value

IMSL_SSQ_POLY_USER, float ssq_poly[] (Output)
Array of size degree × 4 containing the sequential sums of squares for a polynomial fit described under optional argument IMSL_SSQ_POLY.

IMSL_SSQ_POLY_COL_DIM, int ssq_poly_col_dim (Input)
The column dimension of ssq_poly.
Default: ssq_poly_col_dim = 4

IMSL_SSQ_LOF, float **p_ssq_lof (Output)
The address of a pointer to the array containing the lack-of-fit statistics. On return, the pointer is initialized (through a memory allocation request to malloc), and the array is stored there. Typically, float *p_ssq_lof is declared; &p_ssq_lof is used as an argument to this function; and imsl_free(p_ssq_lof) is used to free this array. Row i corresponds to xi, i = 1, …, degree, and the columns are described in the following table:

Column	Description
1	degrees of freedom
2	lack-of-fit sums of squares
3	F-statistic for testing lack-of-fit for a polynomial model of degree i
4	p-value for the test

IMSL_SSQ_LOF_USER, float ssq_lof[] (Output)
Array of size degree × 4 containing the matrix of lack-of-fit statistics described under optional argument IMSL_SSQ_LOF.

IMSL_SSQ_LOF_COL_DIM, int ssq_lof_col_dim (Input)
The column dimension of ssq_lof.
Default: ssq_lof_col_dim = 4

IMSL_X_MEAN, float *x_mean (Output)
The mean of x.

IMSL_X_VARIANCE, float *x_variance (Output)
The variance of x.

IMSL_ANOVA_TABLE, float **p_anova_table (Output)
The address of a pointer to the array containing the analysis of variance table. On return, the pointer is initialized (through a memory allocation request to malloc), and the array is stored there. Typically, float *p_anova_table is declared; &p_anova_table is used as an argument to this function; and imsl_free(p_anova_table) is used to free this array.

Element	Analysis of Variance Statistic
0	degrees of freedom for the model
1	degrees of freedom for error
2	total (corrected) degrees of freedom
3	sum of squares for the model
4	sum of squares for error
5	total (corrected) sum of squares
6	model mean square
7	error mean square
8	overall F-statistic
9	p-value
10	R2 (in percent)
11	adjusted R2 (in percent)
12	estimate of the standard deviation
13	overall mean of y
14	coefficient of variation (in percent)

IMSL_ANOVA_TABLE_USER, float anova_table[] (Output)
Array of size 15 containing the analysis variance statistics listed under optional argument IMSL_ANOVA_TABLE.

IMSL_DF_PURE_ERROR, int *df_pure_error (Output)
If specified, the degrees of freedom for pure error are returned in df_pure_error.

IMSL_SSQ_PURE_ERROR, float *ssq_pure_error (Output)
If specified, the sums of squares for pure error are returned in ssq_pure_error.

IMSL_RESIDUAL, float **p_residual (Output)
The address of a pointer to the array containing the residuals. On return, the pointer is initialized (through a memory allocation request to malloc), and the array is stored there. Typically, float *p_residual is declared; &p_residual is used as an argument to this function; and imsl_free(p_residual)is used to free this array.

IMSL_RESIDUAL_USER, float residual[] (Output)
If specified, residual is an array of length n_observations provided by the user. On return, residual contains the residuals.

IMSL_RETURN_USER, float coefficients[] (Output)
If specified, the least-squares solution for the regression coefficients is stored in array coefficients of size degree + 1 provided by the user.

Description

The function imsl_f_poly_regression computes estimates of the regression coefficients in a polynomial (curvilinear) regression model. In addition to the computation of the fit, imsl_f_poly_regression computes some summary statistics. Sequential sums of squares attributable to each power of the independent variable (stored in ssq_poly) are computed. These are useful in assessing the importance of the higher order powers in the fit. Draper and Smith (1981, pp. 101-102) and Neter and Wasserman (1974, pp. 278-287) discuss the interpretation of the sequential sums of squares. The statistic R2 is the percentage of the sum of squares of y about its mean explained by the polynomial curve. Specifically,

where

is the fitted y value at xi and

is the mean of y. This statistic is useful in assessing the overall fit of the curve to the data. R2 must be between 0% and 100%, inclusive. R2 = 100% indicates a perfect fit to the data.

Estimates of the regression coefficients in a polynomial model are computed using orthogonal polynomials as the regressor variables. This reparameterization of the polynomial model in terms of orthogonal polynomials has the advantage that the loss of accuracy resulting from forming powers of the x-values is avoided. All results are returned to the user for the original model (power form).

The function imsl_f_poly_regression is based on the algorithm of Forsythe (1957). A modification to Forsythe’s algorithm suggested by Shampine (1975) is used for computing the polynomial coefficients. A discussion of Forsythe’s algorithm and Shampine’s modification appears in Kennedy and Gentle (1980, pp. 342-347).

Examples

Example 1

A polynomial model is fitted to data discussed by Neter and Wasserman (1974, pp. 279-285). The data set contains the response variable y measuring coffee sales (in hundred gallons) and the number of self-service coffee dispensers. Responses for 14 similar cafeterias are in the data set. A graph of the results also is given.

#include <imsl.h>

#define DEGREE 2

#define NOBS 14

int main()

{

float *coefficients;

float x[] = {0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0,

4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0};

float y[] = {508.1, 498.4, 568.2, 577.3, 651.7, 657.0, 755.3,

758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4};

coefficients = imsl_f_poly_regression (NOBS, x, y, DEGREE, 0);

imsl_f_write_matrix("Least-Squares Polynomial Coefficients",

DEGREE + 1, 1, coefficients,

IMSL_ROW_NUMBER_ZERO,

0);

}

Output

Least-Squares Polynomial Coefficients

0 503.3

1 78.9

2 -4.0

Figure 25, Figure 10-1 A Polynomial Fit

Example 2

This example is a continuation of the initial example. Here, many optional arguments are used.

#include <stdio.h>

#include <imsl.h>

#define DEGREE 2

#define NOBS 14

int main()

{

int iset = 1, dfpe;

float *coefficients, *anova, sspe, *sspoly, *sslof;

float x[] = {0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0,

4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0};

float y[] = {508.1, 498.4, 568.2, 577.3, 651.7, 657.0, 755.3,

758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4};

char *coef_rlab[2];

char *coef_clab[] = {" ", "intercept", "linear", "quadratic"};

char *stat_clab[] = {" ", "Degrees of\nFreedom",

"Sum of\nSquares", "\nF-Statistic",

"\np-value"};

char *anova_rlab[] = {

"degrees of freedom for regression",

"degrees of freedom for error",

"total (corrected) degrees of freedom",

"sum of squares for regression",

"sum of squares for error",

"total (corrected) sum of squares",

"regression mean square",

"error mean square", "F-statistic",

"p-value", "R-squared (in percent)",

"adjusted R-squared (in percent)",

"est. standard deviation of model error",

"overall mean of y",

"coefficient of variation (in percent)"};

coefficients = imsl_f_poly_regression (NOBS, x, y, DEGREE,

IMSL_SSQ_POLY, &sspoly,

IMSL_SSQ_LOF, &sslof,

IMSL_ANOVA_TABLE, &anova,

IMSL_DF_PURE_ERROR, &dfpe,

IMSL_SSQ_PURE_ERROR, &sspe,

0);

imsl_write_options(-1, &iset);

imsl_f_write_matrix("Least-Squares Polynomial Coefficients",

1, DEGREE + 1, coefficients,

IMSL_COL_LABELS, coef_clab, 0);

coef_rlab[0] = coef_clab[2];

coef_rlab[1] = coef_clab[3];

imsl_f_write_matrix("Sequential Statistics", DEGREE, 4, sspoly,

IMSL_COL_LABELS, stat_clab,

IMSL_ROW_LABELS, coef_rlab,

IMSL_WRITE_FORMAT, "%3.1f%8.1f%6.1f%6.4f",

0);

imsl_f_write_matrix("Lack-of-Fit Statistics", DEGREE, 4, sslof,

IMSL_COL_LABELS, stat_clab,

IMSL_ROW_LABELS, coef_rlab,

IMSL_WRITE_FORMAT, "%3.1f%8.1f%6.1f%6.4f",

0);

imsl_f_write_matrix("* * * Analysis of Variance * * *\n", 15, 1,

anova,

IMSL_ROW_LABELS, anova_rlab,

IMSL_WRITE_FORMAT, "%9.2f",

0);

}

Output

Least-Squares Polynomial Coefficients

intercept linear quadratic

503.3 78.9 -4.0

Sequential Statistics

Degrees of Sum of

Freedom Squares F-Statistic p-value

linear 1.0 220644.2 3415.8 0.0000

quadratic 1.0 4387.7 67.9 0.0000

Lack-of-Fit Statistics

Degrees of Sum of

Freedom Squares F-Statistic p-value

linear 5.0 4793.7 22.0 0.0004

quadratic 4.0 405.9 2.3 0.1548

* * * Analysis of Variance * * *

degrees of freedom for regression 2.00

degrees of freedom for error 11.00

total (corrected) degrees of freedom 13.00

sum of squares for regression 225031.94

sum of squares for error 710.55

total (corrected) sum of squares 225742.48

regression mean square 112515.97

error mean square 64.60

F-statistic 1741.86

p-value 0.00

R-squared (in percent) 99.69

adjusted R-squared (in percent) 99.63

est. standard deviation of model error 8.04

overall mean of y 710.99

coefficient of variation (in percent) 1.13

Warning Errors

IMSL_CONSTANT_YVALUES	The y values are constant. A zero-order polynomial is fit. High order coefficients are set to zero.
IMSL_FEW_DISTINCT_XVALUES	There are too few distinct x values to fit the desired degree polynomial. High order coefficients are set to zero.
IMSL_PERFECT_FIT	A perfect fit was obtained with a polynomial of degree less than degree. High order coefficients are set to zero.

Fatal Errors

IMSL_NONNEG_WEIGHT_REQUEST_2	All weights must be nonnegative.
IMSL_ALL_OBSERVATIONS_MISSING	Each (x, y) point contains NaN (not a number). There are no valid data.
IMSL_CONSTANT_XVALUES	The x values are constant.