Performs a polynomial least-squares regression.
#include <imsl.h>
float *imsl_f_poly_regression (int n_observations, float x[], float y[], int degree, ¼, 0)
The type double procedure is imsl_d_poly_regression.
int
n_observations (Input)
The number of observations.
float x[]
(Input)
Array of length n_observations
containing the independent variable.
float y[]
(Input)
Array of length n_observations
containing the dependent variable.
int degree
(Input)
The degree of the polynomial.
A pointer to the vector of size degree + 1 containing the coefficients of the fitted polynomial. If a fit cannot be computed, then NULL is returned.
#include <imsl.h>
float
*imsl_f_poly_regression (int
n_observations, float
xdata[],
float
ydata[],
int
degree,
IMSL_WEIGHTS, float
weights[],
IMSL_SSQ_POLY, float
**p_ssq_poly,
IMSL_SSQ_POLY_USER, float
ssq_poly[],
IMSL_SSQ_POLY_COL_DIM, int
ssq_poly_col_dim,
IMSL_SSQ_LOF, float
**p_ssq_lof,
IMSL_SSQ_LOF_USER, float
ssq_lof[],
IMSL_SSQ_LOF_COL_DIM, int
ssq_lof_col_dim,
IMSL_X_MEAN, float
*x_mean,
IMSL_X_VARIANCE, float
*x_variance,
IMSL_ANOVA_TABLE, float
**p_anova_table,
IMSL_ANOVA_TABLE_USER, float
anova_table[],
IMSL_DF_PURE_ERROR, int
*df_pure_error,
IMSL_SSQ_PURE_ERROR, float
*ssq_pure_error,
IMSL_RESIDUAL, float
**p_residual,
IMSL_RESIDUAL_USER, float
residual[],
IMSL_RETURN_USER, float
coefficients[],
0)
IMSL_WEIGHTS, float weights[]
(Input)
Array with n_observations
components containing the vector of weights for the observation. If this option
is not specified, all observations have equal weights of one.
IMSL_SSQ_POLY, float
**p_ssq_poly (Output)
The address of a pointer to the
array containing the sequential sums of squares and other statistics. On return,
the pointer is initialized (through a memory allocation request to malloc), and the array
is stored there. Typically, float *p_ssq_poly is
declared; &p_ssq_poly is
used as an argument to this function; and free(p_ssq_poly) is
used to free this array. Row i corresponds to xi,
i = 1, ¼, degree,
and the columns are described as follows:
|
Column |
Description |
|
1 |
degrees of freedom |
|
2 |
sums of squares |
|
3 |
F-statistic |
|
4 |
p-value |
IMSL_SSQ_POLY_USER, float
ssq_poly[] (Output)
Array of size degree ´ 4 containing the sequential sums of
squares for a polynomial fit described under optional argument IMSL_SSQ_POLY.
IMSL_SSQ_POLY_COL_DIM, int
ssq_poly_col_dim (Input)
The column dimension of ssq_poly.
Default:
ssq_poly_col_dim = 4
IMSL_SSQ_LOF, float
**p_ssq_lof (Output)
The address of a pointer to the array
containing the lack-of-fit statistics. On return, the pointer is initialized
(through a memory allocation request to malloc), and the array
is stored there. Typically, float *p_ssq_lof is
declared; &p_ssq_lof is used
as an argument to this function; and free(p_ssq_lof) is
used to free this array. Row i corresponds to
xi,
i = 1, ¼, degree,
and the columns are described in the following table:
|
Column |
Description |
|
1 |
degrees of freedom |
|
2 |
lack-of-fit sums of squares |
|
3 |
F-statistic for testing lack-of-fit for a polynomial model of degree i |
|
4 |
p-value for the test |
IMSL_SSQ_LOF_USER, float ssq_lof[]
(Output)
Array of size degree ´ 4 containing the matrix of lack-of-fit
statistics described under optional argument IMSL_SSQ_LOF.
IMSL_SSQ_LOF_COL_DIM, int
ssq_lof_col_dim (Input)
The column dimension of
ssq_lof.
Default:
ssq_lof_col_dim = 4
IMSL_X_MEAN, float *x_mean
(Output)
The mean of x.
IMSL_X_VARIANCE, float
*x_variance (Output)
The variance of x.
IMSL_ANOVA_TABLE, float
**p_anova_table (Output)
The address of a pointer to the
array containing the analysis of variance table. On return, the pointer is
initialized (through a memory allocation request to malloc), and the array
is stored there. Typically, float *p_anova_table is
declared; &p_anova_table is
used as an argument to this function; and free(p_anova_table) is
used to free this array.
|
Element |
Analysis of Variance Statistic |
|
0 |
degrees of freedom for the model |
|
1 |
degrees of freedom for error |
|
2 |
total (corrected) degrees of freedom |
|
3 |
sum of squares for the model |
|
4 |
sum of squares for error |
|
5 |
total (corrected) sum of squares |
|
6 |
model mean square |
|
7 |
error mean square |
|
8 |
overall F-statistic |
|
9 |
p-value |
|
10 |
R2 (in percent) |
|
11 |
adjusted R2 (in percent) |
|
12 |
estimate of the standard deviation |
|
13 |
overall mean of y |
|
14 |
coefficient of variation (in percent) |
IMSL_ANOVA_TABLE_USER, float
anova_table[] (Output)
Array of size 15 containing the
analysis variance statistics listed under optional argument IMSL_ANOVA_TABLE.
IMSL_DF_PURE_ERROR, int
*df_pure_error (Output)
If specified, the degrees of
freedom for pure error are returned in df_pure_error.
IMSL_SSQ_PURE_ERROR, float
*ssq_pure_error (Output)
If specified, the sums of squares
for pure error are returned in ssq_pure_error.
IMSL_RESIDUAL, float
**p_residual (Output)
The address of a pointer to the
array containing the residuals. On return, the pointer is initialized (through a
memory allocation request to malloc), and the array
is stored there. Typically, float *p_residual is
declared; &p_residual is
used as an argument to this function; and free(p_residual)is used to
free this array.
IMSL_RESIDUAL_USER, float
residual[] (Output)
If specified, residual is an array
of length n_observations
provided by the user. On return, residual contains the
residuals.
IMSL_RETURN_USER, float
coefficients[] (Output)
If specified, the least-squares
solution for the regression coefficients is stored in array coefficients of size
degree + 1
provided by the user.
The function imsl_f_poly_regression
computes estimates of the regression coefficients in a polynomial (curvilinear)
regression model. In addition to the computation of the fit, imsl_f_poly_regression
computes some summary statistics. Sequential sums of squares attributable to
each power of the independent variable (stored in ssq_poly)
are computed. These are useful in assessing the importance of the higher order
powers in the fit. Draper and Smith (1981, pp. 101−102) and Neter and Wasserman
(1974, pp. 278−287)
discuss the interpretation of the sequential sums of squares. The statistic
R2 is the percentage of
the sum of squares of
y about its mean explained by the polynomial
curve. Specifically,

where
is the fitted y value at xi and
is the mean of y.
This statistic is useful in assessing the overall fit of the curve to the data.
R2 must be between 0% and
100%, inclusive. R2 = 100%
indicates a perfect fit to the data.
Estimates of the regression coefficients in a polynomial model are computed using orthogonal polynomials as the regressor variables. This reparameterization of the polynomial model in terms of orthogonal polynomials has the advantage that the loss of accuracy resulting from forming powers of the x-values is avoided. All results are returned to the user for the original model (power form).
The function imsl_f_poly_regression is based on the algorithm of Forsythe (1957). A modification to Forsythe’s algorithm suggested by Shampine (1975) is used for computing the polynomial coefficients. A discussion of Forsythe’s algorithm and Shampine’s modification appears in Kennedy and Gentle (1980, pp. 342−347).
A polynomial model is fitted to data discussed by Neter and
Wasserman (1974, pp. 279−285). The data set contains
the response variable y measuring coffee sales (in hundred gallons) and
the number of self-service coffee dispensers. Responses for
14 similar
cafeterias are in the data set. A graph of the results also is given.
#include <imsl.h>
#define
DEGREE 2
#define
NOBS
14
main()
{
float *coefficients;
float x[] = {0.0, 0.0, 1.0, 1.0, 2.0, 2.0,
4.0,
4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0};
float y[] = {508.1, 498.4, 568.2, 577.3,
651.7, 657.0,
755.3,
758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4};
coefficients = imsl_f_poly_regression (NOBS, x, y, DEGREE,
0);
imsl_f_write_matrix("Least-Squares Polynomial
Coefficients",
DEGREE + 1, 1, coefficients,
IMSL_ROW_NUMBER_ZERO,
0);
}
Least-Squares Polynomial
Coefficients
0
503.3
1
78.9
2 -4.0

Figure 10-1 A Polynomial Fit
This example is a continuation of the initial example. Here, many optional arguments are used.
#include <stdio.h>
#include
<imsl.h>
#define
DEGREE 2
#define
NOBS
14
void main()
{
int iset = 1,
dfpe;
float *coefficients,
*anova, sspe, *sspoly, *sslof;
float x[] = {0.0, 0.0, 1.0, 1.0, 2.0, 2.0,
4.0,
4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0};
float y[] = {508.1, 498.4, 568.2, 577.3, 651.7,
657.0,
755.3,
758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4};
char *coef_rlab[2];
char *coef_clab[] = {" ", "intercept",
"linear", "quadratic"};
char *stat_clab[] = {" ", "Degrees
of\nFreedom",
"Sum of\nSquares",
"\nF-Statistic",
"\np-value"};
char
*anova_rlab[] =
{
"degrees of freedom for regression",
"degrees of freedom for error",
"total (corrected) degrees of
freedom",
"sum of squares for regression",
"sum of squares for error",
"total (corrected) sum of
squares",
"regression mean square",
"error mean square",
"F-statistic",
"p-value", "R-squared (in percent)",
"adjusted R-squared (in
percent)",
"est. standard deviation of model error",
"overall mean of y",
"coefficient of variation (in percent)"};
coefficients
= imsl_f_poly_regression (NOBS, x, y,
DEGREE,
IMSL_SSQ_POLY,
&sspoly,
IMSL_SSQ_LOF,
&sslof,
IMSL_ANOVA_TABLE,
&anova,
IMSL_DF_PURE_ERROR,
&dfpe,
IMSL_SSQ_PURE_ERROR,
&sspe,
0);
imsl_write_options(-1,
&iset);
imsl_f_write_matrix("Least-Squares Polynomial
Coefficients",
1,
DEGREE + 1,
coefficients,
IMSL_COL_LABELS, coef_clab, 0);
coef_rlab[0] =
coef_clab[2];
coef_rlab[1] =
coef_clab[3];
imsl_f_write_matrix("Sequential Statistics",
DEGREE, 4, sspoly,
IMSL_COL_LABELS,
stat_clab,
IMSL_ROW_LABELS,
coef_rlab,
IMSL_WRITE_FORMAT,
"%3.1f%8.1f%6.1f%6.4f",
0);
imsl_f_write_matrix("Lack-of-Fit Statistics", DEGREE,
4, sslof,
IMSL_COL_LABELS,
stat_clab,
IMSL_ROW_LABELS,
coef_rlab,
IMSL_WRITE_FORMAT,
"%3.1f%8.1f%6.1f%6.4f",
0);
imsl_f_write_matrix("* * * Analysis of Variance * *
*\n", 15,
1,
anova,
IMSL_ROW_LABELS,
anova_rlab,
IMSL_WRITE_FORMAT,
"%9.2f",
0);
}
Least-Squares Polynomial
Coefficients
intercept linear
quadratic
503.3
78.9
-4.0
Sequential
Statistics
Degrees of Sum
of
Freedom
Squares F-Statistic
p-value
linear
1.0 220644.2 3415.8
0.0000
quadratic 1.0
4387.7 67.9
0.0000
Lack-of-Fit Statistics
Degrees
of Sum
of
Freedom Squares F-Statistic
p-value
linear
5.0 4793.7
22.0
0.0004
quadratic
4.0
405.9 2.3
0.1548
* * * Analysis of Variance * *
*
degrees of freedom for
regression
2.00
degrees of freedom for
error
11.00
total (corrected) degrees of freedom
13.00
sum of squares for
regression
225031.94
sum of squares for
error
710.55
total (corrected) sum of squares
225742.48
regression mean
square
112515.97
error mean
square
64.60
F-statistic
1741.86
p-value
0.00
R-squared (in
percent)
99.69
adjusted R-squared (in
percent)
99.63
est. standard deviation of model error
8.04
overall mean of
y
710.99
coefficient of variation (in percent)
1.13
IMSL_CONSTANT_YVALUES The y values are constant. A zero-order polynomial is fit. High order coefficients are set to zero.
IMSL_FEW_DISTINCT_XVALUES There are too few distinct x values to fit the desired degree polynomial. High order coefficients are set to zero.
IMSL_PERFECT_FIT A perfect fit was obtained with a polynomial of degree less than degree. High order coefficients are set to zero.
IMSL_NONNEG_WEIGHT_REQUEST_2 All weights must be nonnegative.
IMSL_ALL_OBSERVATIONS_MISSING Each (x, y) point contains NaN (not a number). There are no valid data.
IMSL_CONSTANT_XVALUES The x values are constant.
|
Visual Numerics, Inc. PHONE: 713.784.3131 FAX:713.781.9260 |