polyRegression

Performs a polynomial least-squares regression.

Synopsis

polyRegression (x, y, degree)

Required Arguments

float x[] (Input)
Array of length nObservations containing the independent variable.
float y[] (Input)
Array of length nObservations containing the dependent variable.
int degree (Input)
Degree of the polynomial.

Return Value

An array of size degree + 1 containing the coefficients of the fitted polynomial. If a fit cannot be computed, None is returned.

Optional Arguments

weights, float[] (Input)

Array with nObservations components containing the array of weights for the observation.

Default: weights[] = 1

ssqPoly (Output)
The array containing the sequential sums of squares and other statistics. Row i corresponds to \(x^i\), i = 0, …, degree − 1, and the columns are described as follows:
Column Description
0 degrees of freedom
1 Sums of squares
2 F-statistic
3 p-value
ssqLof (Output)
The array containing the lack-of-fit statistics. Row i corresponds to \(x^i\), i = 0, …, degree − 1, and the columns are described in the following table:
Column Description
0 degrees of freedom
1 lack-of-fit sums of squares
2 F-statistic for testing lack-of-fit for a polynomial model of degree i
3 p-value for the test
xMean (Output)
Mean of x.
xVariance (Output)
Variance of x.
anovaTable (Output)

The array containing the analysis of variance table.

Column Description
0 degrees of freedom for the model
1 degrees of freedom for error
2 total (corrected) degrees of freedom
3 sum of squares for the model
4 sum of squares for error
5 total (corrected) sum of squares
6 model mean square
7 error mean square
8 overall F-statistic
9 p-value
10 \(R^2\) (in percent)
11 adjusted \(R^2\) (in percent)
12 estimate of the standard deviation
13 overall mean of y
14 coefficient of variation (in percent)

Note that the p‑value is returned as 0.0 when the value is so small that all significant digits have been lost.

dfPureError (Output)
If specified, the degrees of freedom for pure error are returned in dfPureError.
ssqPureError (Output)
If specified, the sums of squares for pure error are returned in ssqPureError.
residual (Output)
The array containing the residuals.
polyRegressionInfo (Output)
A structure containing the information about the polynomial fit required as input for PyIMSL function polyPrediction.

Description

Function polyRegression computes estimates of the regression coefficients in a polynomial (curvilinear) regression model. In addition to the computation of the fit, polyRegression computes some summary statistics. Sequential sums of squares attributable to each power of the independent variable (stored in ssqPoly) are computed. These are useful in assessing the importance of the higher order powers in the fit. Draper and Smith (1981, pp. 101−102) and Neter and Wasserman (1974, pp. 278−287) discuss the interpretation of the sequential sums of squares. The statistic \(R^2\) is the percentage of the sum of squares of y about its mean explained by the polynomial curve. Specifically,

\[R^2 = \frac{\Sigma w_i \left(\hat{y}_i - \overline{y}\right)^2} {\Sigma w_i \left(y_i - \overline{y}\right)^2} 100\%\]

where

\[\hat{y}_i\]

is the fitted y value at \(x_i\) and \(\bar{y}\) is the mean of y. This statistic is useful in assessing the overall fit of the curve to the data. \(R^2\) must be between 0 and 100 percent, inclusive. \(R^2\) = 100 percent indicates a perfect fit to the data.

Estimates of the regression coefficients in a polynomial model are computed using orthogonal polynomials as the regressor variables. This reparameterization of the polynomial model in terms of orthogonal polynomials has the advantage that the loss of accuracy resulting from forming powers of the x-values is avoided. All results are returned to the user for the original model (power form).

Function polyRegression is based on the algorithm of Forsythe (1957). A modification to Forsythe’s algorithm suggested by Shampine (1975) is used for computing the polynomial coefficients. A discussion of Forsythe’s algorithm and Shampine’s modification appears in Kennedy and Gentle (1980, pp. 342−347).

Examples

Example 1

A polynomial model is fitted to data discussed by Neter and Wasserman (1974, pp. 279−285). The data set contains the response variable y measuring coffee sales (in hundred gallons) and the number of self-service coffee dispensers. Responses for 14 similar cafeterias are in the data set. A graph of the results is also given.

from __future__ import print_function
from numpy import *
from pyimsl.stat.polyRegression import polyRegression
from pyimsl.stat.writeMatrix import writeMatrix


x = array([0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0,
           4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0])
y = array([508.1, 498.4, 568.2, 577.3, 651.7, 657.0, 755.3,
           758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4])
degree = 2

coefficients = polyRegression(x, y, degree)

writeMatrix("Least Squares Polynomial Coefficients", coefficients, column=True)

Output

 
Least Squares Polynomial Coefficients
           1        503.3
           2         78.9
           3         -4.0
../../_images/csch2-fig1-PolynomialFit.png

Figure 2.1 — A Polynomial Fit

Example 2

This example is a continuation of the initial example. Here, many optional arguments are used.

from numpy import *
from pyimsl.stat.polyRegression import polyRegression
from pyimsl.stat.writeMatrix import writeMatrix

x = array([0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 4.0,
           4.0, 5.0, 5.0, 6.0, 6.0, 7.0, 7.0])
y = array([508.1, 498.4, 568.2, 577.3, 651.7, 657.0, 755.3,
           758.9, 787.6, 792.1, 841.4, 831.8, 854.7, 871.4])
degree = 2
coef_clab = [" ", "intercept", "linear", "quadratic"]
stat_clab = [" ", "Degrees of\nFreedom",
             "Sum of\nSquares",
             "\nF-Statistic", "\np-value"]
anova_rlab = \
    ["degrees of freedom for regression",
     "degrees of freedom for error",
     "total (corrected) degrees of freedom",
     "sum of squares for regression",
     "sum of squares for error",
     "total (corrected) sum of squares",
     "regression mean square",
     "error mean square", "F-statistic",
     "p-value", "R-squared (in percent)",
     "adjusted R-squared (in percent)",
     "est. standard deviation of model error",
     "overall mean of y",
     "coefficient of variation (in percent)"]
ssqpoly = []
ssqlof = []
anova_table = []
dfpe = []
sspe = []

coefficients = polyRegression(x, y, degree,
                              ssqPoly=ssqpoly,
                              ssqLof=ssqlof,
                              anovaTable=anova_table,
                              dfPureError=dfpe,
                              ssqPureError=sspe)

writeMatrix("Least Squares Polynomial Coefficients",
            coefficients, colLabels=coef_clab)
coef_rlab = [coef_clab[2], coef_clab[3]]
writeMatrix("Sequential Statistics", ssqpoly,
            colLabels=stat_clab, rowLabels=coef_rlab,
            writeFormat="%3.1f%8.1f%6.1f%6.4f")
writeMatrix("Lack-of-Fit Statistics", ssqlof,
            colLabels=stat_clab, rowLabels=coef_rlab,
            writeFormat="%3.1f%8.1f%6.1f%6.4f")
writeMatrix("* * * Analysis of Variance * * *\n",
            anova_table, rowLabels=anova_rlab, writeFormat="%9.2f", column=True)

Output

 
  Least Squares Polynomial Coefficients
     intercept       linear    quadratic
         503.3         78.9         -4.0
 
                Sequential Statistics
           Degrees of    Sum of                      
              Freedom   Squares  F-Statistic  p-value
linear            1.0  220644.1       3410.2   0.0000
quadratic         1.0    4387.7         67.8   0.0000
 
               Lack-of-Fit Statistics
           Degrees of    Sum of                      
              Freedom   Squares  F-Statistic  p-value
linear            5.0    4794.8         22.0   0.0004
quadratic         4.0     407.1          2.3   0.1540
 
        * * * Analysis of Variance * * *

degrees of freedom for regression            2.00
degrees of freedom for error                11.00
total (corrected) degrees of freedom        13.00
sum of squares for regression           225031.81
sum of squares for error                   711.70
total (corrected) sum of squares        225743.51
regression mean square                  112515.90
error mean square                           64.70
F-statistic                               1739.03
p-value                                      0.00
R-squared (in percent)                      99.68
adjusted R-squared (in percent)             99.63
est. standard deviation of model error       8.04
overall mean of y                          710.99
coefficient of variation (in percent)        1.13

Warning Errors

IMSLS_CONSTANT_YVALUES The y values are constant. A zero-order polynomial is fit. High order coefficients are set to zero.
IMSLS_FEW_DISTINCT_XVALUES There are too few distinct x values to fit the desired degree polynomial. High order coefficients are set to zero.
IMSLS_PERFECT_FIT A perfect fit was obtained with a polynomial of degree less than degree. High order coefficients are set to zero.

Fatal Errors

IMSLS_NONNEG_WEIGHT_REQUEST_2 All weights must be nonnegative.
IMSLS_ALL_OBSERVATIONS_MISSING Each (x, y) point contains NaN. There are no valid data.
IMSLS_CONSTANT_XVALUES The x values are constant.