RPOLY

FNLStat : Regression : RPOLY

RPOLY

Required Arguments

Analyzes a polynomial regression model.

Required Arguments

X — NOBS by NCOL matrix containing the data. (Input)

IRSP — Column number IRSP of X contains the data for the response (dependent) variable. (Input)

IND — Column number IND of X contains the data for the independent (explanatory) variable. (Input)

MAXDEG — Maximum degree of polynomial to be fit. (Input)

NDEG — Degree of final polynomial regression. (Output)

COEF — NDEG + 1 by 4 matrix containing statistics relating to the coefficients of the polynomial model. (Output)
Row 1 corresponds to the intercept. Row 1 + i corresponds to the coefficient of xi. The columns are described as follows:

Col.	Description
1	Estimated coefficient
2	Estimated standard error of the estimated coefficient
3	t-statistic for the test the coefficient is zero
4	p‑value for the two-sided t test

Optional Arguments

NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).

NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies. If X(i, IFRQ) = 0.0, none of the remaining elements of row i of X are referenced, and updating of statistics is skipped for row i.
Default: IFRQ = 0.

IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights, and the computed prediction interval uses AOV (8) = X(i, IWT) for the estimated variance of a future response.
Default: IWT = 0.

IPRED — Prediction interval option. (Input)
IPRED = 0 means that prediction intervals are desired for a single future response. For positive IPRED, column number IPRED of X contains the number of future responses for which a prediction interval is desired on the average of the future responses.
Default: IPRED = 0.

CONPCM — Confidence level for two-sided interval estimates on the mean in percent. (Input)
Default: CONPCM = 95.0.

CONPCP — Confidence level for two-sided prediction intervals in percent. (Input)
Default: CONPCP = 95.0.

ICRIT — Criterion option. (Input)
Default: ICRIT = 0.

ICRIT	Meaning
0	Fit a MAXDEG-th degree polynomial.
1	Fit the lowest degree polynomial with an R2 (in percent) of at least CRIT.
2	Fit the lowest degree polynomial with a lack-of-fit F test not significant at level CRIT percent.

CRIT — Criterion in percent. (Input, if ICRIT = 1 or ICRIT = 2, not referenced if
ICRIT = 0)
Default: CRIT = 95.0.

CRIT	Meaning of CRIT
1	R2 (in percent) that the fitted polynomial must achieve. A common choice is 95.0.
2	Significance level (in percent) for the lack-of-fit test that the fitted polynomial must not exceed. A common choice is 5.0.

LOF — Lack of fit option. (Input)
If ICRIT = 2, LOF must equal 1.
Default: LOF = 0.

LOF	Action
0	TLOF is not computed.
1	TLOF is computed.

IPRINT — Printing option. (Input)
Default: IPRINT = 0.

IPRINT	Action
0	No printing is performed.
1	AOV, SQSS, COEF, TLOF are printed.
2	AOV, SQSS, COEF, TLOF, unusual cases in CASE and plots of the data,and the fitted polynomial are printed.
3	AOV, SQSS, COEF, TLOF, CASE, plots of the data, the fitted polynomial, and the residuals are printed.

AOV — Vector of length 15 that contains statistics relating to the analysis of variance. (Output)

I	AOV(I)
1	Degrees of freedom for regression
2	Degrees of freedom for error
3	Total degrees of freedom
4	Sum of squares for regression
5	Sum of squares for error
6	Total sum of squares
7	Regression mean square
8	Error mean square
9	F-statistic
10	p‑value
11	R2 (in percent)
12	Adjusted R2 (in percent)
13	Estimated standard deviation of the model error
14	Mean of the response (dependent) variable
15	Coefficient of variation (in percent)

SQSS — NDEG by 4 matrix containing sequential statistics for the polynomial model. (Output)
Row i corresponds to xi(i = 1, 2, …, NDEG). The columns are described as follows:

Col.	Description
1	Degrees of freedom
2	Sum of squares
3	F-statistic
4	p‑value

LDSQSS — Leading dimension of SQSS exactly as specified in the dimension statement in the calling program. (Input)
Default: LDSQSS = size (SQSS,1).

LDCOEF — Leading dimension of COEF exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCOEF = size (COEF,1).

TLOF — NDEG by 4 matrix containing tests of lack of fit for each degree of the polynomial. (Output, if LOF = 1)
Row i corresponds to xi(i = 1, 2, …, NDEG). The columns are described as follows:

Col.	Description
1	Degrees of freedom
2	Lack-of-fit sum of squares
3	F test for lack of fit of the polynomial model of degree i
4	p‑value for the F test

If LOF = 0, TLOF is not referenced and can be a 1 by 1 array.

LDTLOF — Leading dimension of TLOF exactly as specified in the dimension statement in the calling program. (Input)
Default: LDTLOF = size (TLOF,1).

CASE — NOBS by 12 matrix containing the case statistics. (Output)
Columns 1 through 12 contain the following:

Col.	Description
1	Observed response
2	Predicted response
3	Residual
4	Leverage
5	Standardized residual
6	Jackknife residual
7	Cook’s distance
8	DFFITS
9, 10	Confidence interval on the mean
11, 12	Prediction interval

LDCASE — Leading dimension of CASE exactly as specified in the dimension statement in the calling program. (Input)
Default: LDCASE = size (CASE,1).

NRMISS — Number of rows of CASE containing NaN (not a number). (Output)

FORTRAN 90 Interface

Generic: CALL RPOLY (X, IRSP, IND, MAXDEG, NDEG, COEF [, …])

Specific: The specific interface names are S_RPOLY and D_RPOLY.

FORTRAN 77 Interface

Single: CALL RPOLY (NOBS, NCOL, X, LDX, IRSP, IND, IFRQ, IWT, IPRED, CONPCM, CONPCP, MAXDEG, ICRIT, CRIT, LOF, IPRINT, NDEG, AOV, SQSS, LDSQSS, COEF, LDCOEF, TLOF, LDTLOF, CASE, LDCASE, NRMISS)

Double: The double precision name is DRPOLY.

Description

Routine RPOLY computes estimates of the regression coefficients in a polynomial (curvilinear) regression model. The degree of the polynomial can be specified, or the degree of the polynomial can be determined by RPOLY under one of two criteria:

1. If some of the x settings are repeated, the lowest degree polynomial can be fit whose lack of fit is not significant at a specified level.

2. The lowest degree polynomial can be fitted with an R2 that meets a specified lower bound.

In addition to the computation of the fit, RPOLY computes and prints summary statistics (analysis of variance, sequential sums of squares, t tests for the coefficients, tests for lack of fit), case statistics (diagnostics for individual cases, confidence and prediction intervals), and plots (data, fitted data, and residuals).

Routine RPOLY computes estimates of the regression coefficients in a polynomial regression model using orthogonal polynomials. The reparameterization of the polynomial model in terms of orthogonal polynomials has the advantage that the loss of accuracy resulting from forming powers of the x settings is avoided. All results are returned to the user for the original model.

Often a predicted value and a confidence interval are desired for a setting of the independent variable not used in computing the regression fit. This is accomplished by including an extra row in the data matrix with the desired setting of the independent variable and with the response set equal to NaN (not a number). NaN can be retrieved by AMACH(6), which is documented in the Reference Material. The row of the data matrix containing NaN will be omitted from the computations for determining the regression fit, and a prediction and a confidence interval for the missing response will be computed from the given setting of the independent variable.

Routine RPOLY is based on the algorithm of Forsythe (1957). A modification to Forsythe’s algorithm suggested by Shampine (1975) is used for computing the polynomial coefficients. A discussion of Forsythe’s algorithm and Shampine’s modification appears in Kennedy and Gentle (1980, pages 342‑347). A modification to Forsythe’s algorithm is made for the inclusion of weights (Kelly 1967, page 68).

Comments

1. Workspace may be explicitly provided, if desired, by use of R2OLY/DR2OLY. The reference is:

CALL R2OLY (NOBS, NCOL, X, LDX, IRSP, IND, IFRQ, IWT, IPRED,CONPCM, CONPCP, MAXDEG, ICRIT, CRIT, LOF, IPRINT, NDEG, AOV, SQSS, LDSQSS, COEF, LDCOEF, TLOF, LDTLOF, CASE, LDCASE, NRMISS, WK, IWK)

The additional arguments are as follows:

WK — Work vector of length MAXDEG2 + 8 * MAXDEG + 8 * NOBS + 5

IWK — Work vector of length NOBS.

2. Informational errors

Type	Code	Description
4	1	An invalid weight is encountered. Weights must be nonnegative.
4	2	An invalid frequency is encountered. Frequencies must be nonnegative.
4	7	The independent variable is constant. At least two distinct settings of the independent variable are needed.
4	8	The number of future observations for a prediction interval must be positive.
3	4	The response is constant. A zero degree polynomial is fit.
3	5	There are too few observations to fit the desired degree polynomial. NDEG is set to one less than the number of valid observations.
3	6	A perfect fit to the data was obtained with a polynomial of lower degree than MAXDEG.

Example

A polynomial model is fitted to data discussed by Neter and Wasserman (1974, pages 279‑285). The data set contains the response variable y measuring coffee sales (in hundred gallons) and the number of self-service coffee dispensers. Responses for fourteen similar cafeterias are in the data set. Some of the cafeterias have the same number of dispensers so that lack of fit of the model can be assessed.

USE RPOLY_INT

IMPLICIT NONE

INTEGER LDCASE, LDCOEF, LDSQSS, LDTLOF, LDX, MAXDEG, NCOL, &

NOBS, J

PARAMETER (MAXDEG=2, NCOL=2, NOBS=14, LDCASE=NOBS, &

LDCOEF=MAXDEG+1, LDSQSS=MAXDEG, LDTLOF=MAXDEG, &

LDX=NOBS)

INTEGER ICRIT, IFRQ, IND, IPRED, IPRINT, IRSP, IWT, LOF, &

NDEG, NRMISS

REAL AOV(15), CASE(LDCASE,12), COEF(LDCOEF,4), CONPCM, &

CONPCP, CRIT, SQSS(LDSQSS,4), TLOF(LDTLOF,4), &

X(LDX,NCOL)

DATA (X(1,J),J=1,2) /0.0, 508.1/

DATA (X(2,J),J=1,2) /5.0, 787.6/

DATA (X(3,J),J=1,2) /0.0, 498.4/

DATA (X(4,J),J=1,2) /1.0, 568.2/

DATA (X(5,J),J=1,2) /2.0, 651.7/

DATA (X(6,J),J=1,2) /7.0, 854.7/

DATA (X(7,J),J=1,2) /2.0, 657.0/

DATA (X(8,J),J=1,2) /4.0, 755.3/

DATA (X(9,J),J=1,2) /6.0, 831.8/

DATA (X(10,J),J=1,2) /4.0, 758.9/

DATA (X(11,J),J=1,2) /5.0, 792.1/

DATA (X(12,J),J=1,2) /6.0, 841.4/

DATA (X(13,J),J=1,2) /7.0, 871.4/

DATA (X(14,J),J=1,2) /1.0, 577.3/

IRSP = 2

IND = 1

LOF = 1

IPRINT = 1

CALL RPOLY (X, IRSP, IND, MAXDEG, NDEG, COEF, lof=lof, IPRINT=IPRINT,&

aov=aov,sqss=sqss, tlof=tlof, case=case, nrmiss=nrmiss)

END

Output

R-squared Adjusted Est. Std. Dev. Coefficient of

(percent) R-squared of Model Error Mean Var. (percent)

99.685 99.628 8.037 711.0 1.13

* * * Analysis of Variance * * *

Sum of Mean Prob. of

Source DF Squares Square Overall F Larger F

Regression 2 225031.9 112515.9 1741.748 0.0000

Residual 11 710.6 64.6

Corrected Total 13 225742.5

* * * Inference on Coefficients * * *

Standard Prob. of

Coef. Estimate Error t-statistic Larger |t|

1 503.3 4.791 105.054 0.0000

2 78.9 3.453 22.865 0.0000

3 -4.0 0.482 -8.242 0.0000

* * * Sequential Statistics * * *

Degree of Degrees of Sum of Prob. of

Polynomial Freedom Squares F-statistic Larger F

1 1 220644.1 3415.574 0.0000

2 1 4387.7 67.922 0.0000

* * * Tests of Lack of Fit * * *

Degree of Degrees of Sum of Prob. of

Polynomial Freedom Squares F-statistic Larger F

1 5 4793.7 22.031 0.0004

2 4 406.0 2.332 0.1547