RFORP

FNLStat : Regression : RFORP

RFORP

Required Arguments

Fits an orthogonal polynomial regression model.

Required Arguments

X — NOBS by NCOL matrix containing the data. (Input)

IRSP — Column number IRSP of X contains the data for the response (dependent) variable. (Input)

IND — Column number IND of X contains the data for the independent (explanatory) variable. (Input)

MAXDEG — Maximum degree of polynomial to be fit. (Input)

NDEG — Degree of final polynomial regression. (Output)

A — Vector of length MAXDEG containing constants used to generate orthogonal polynomials. (Output)
Only the first NDEG elements of A are referenced.

B — Vector of length MAXDEG containing constants used to generate orthogonal polynomials. (Output)
Only the first NDEG elements of B are referenced.

SCOEF — Vector of length 1 + MAXDEG containing the regression coefficients α of the fitted model using the scaled version of x(z). (Output)
Only the first 1 + NDEG elements of SCOEF are referenced.

is the estimated intercept and equals the response mean.

contains the estimated coefficient for the i-th order orthogonal polynomial using the scaled version of x(z).

D — Vector of length MAXDEG + 1 containing the diagonal elements of the (diagonal) sums of squares and crossproducts matrix. (Output)
The sum of squares due to the i-th degree orthogonal polynomial is given by

Only the first NDEG + 1 elements of D are referenced.

Optional Arguments

NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).

NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies. If X(i, IFRQ) = 0.0, none of the remaining elements of row i of X are referenced, and updating of statistics is skipped for row i.
Default: IFRQ = 0.

IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.

ICRIT — Criterion option. (Input)
Default: ICRIT = 0.

ICRIT	Meaning
0	Fit a MAXDEG-th degree polynomial.
1	Fit the lowest degree polynomial with an R2 (in percent) of at least CRIT.
2	Fit the lowest degree polynomial with a lack-of-fit F test not significant at level CRIT percent.

CRIT — Criterion in percent. (Input, if ICRIT = 1 or ICRIT = 2)
Default: CRIT = 95.0.

ICRIT	Meaning of CRIT
1	R2 (in percent) that the fitted polynomial must achieve. A common choice is 95.0.
2	Significance level (in percent) for the lack-of-fit test that the fitted polynomial must not exceed. A common choice is 5.0.

LOF — Lack-of-fit option. (Input)
If ICRIT = 2, LOF must equal 1.
Default: LOF = 0.

LOF	Action
0	DFPE and SSPE are not computed.
1	DFPE and SSPE are computed.

SMULTC — Multiplicative constant used to compute a scaled version of x, say z, on the interval ‑2 to 2, inclusive. (Output)

SADDC — Additive constant used to compute a scaled version of x(z) on the interval ‑2 to 2, inclusive. (Output)

DFE — Degrees of freedom for error. (Output)

SSE — Sum of squares for error. (Output)

DFPE — Degrees of freedom for pure error. (Output, if LOF = 1)

SSPE — Sum of squares for pure error. (Output, if LOF = 1)

NRMISS — Number of rows of data encountered that contain any missing values for the independent, response, weight, or frequency variables. (Output)
NaN (not a number) is used as the missing value code. Any row of X containing NaN as a value of the independent, response, weight, or frequency variables is omitted from the fit.

FORTRAN 90 Interface

Generic: CALL RFORP (X, IRSP, IND, MAXDEG, NDEG, A, B, SCOEF, D [, …])

Specific: The specific interface names are S_RFORP and D_RFORP.

FORTRAN 77 Interface

Single: CALL RFORP (NOBS, NCOL, X, LDX, IRSP, IND, IFRQ, IWT, MAXDEG, ICRIT, CRIT, LOF, NDEG, SMULTC, SADDC, A, B, SCOEF, D, DFE, SSE, DFPE, SSPE, NRMISS)

Double: The double precision name is DRFORP.

Description

Routine RFORP computes estimates of the regression coefficients in a polynomial regression model using orthogonal polynomials. The reparameterization of the polynomial model in terms of orthogonal polynomials has the advantage that the loss of accuracy resulting from forming powers of the x values is avoided. The design of RFORP assumes that further computations such as summary statistics or case statistics are needed. For this reason, the results returned by RFORP are for the reparameterized model in terms of orthogonal polynomials. This enables computational accuracy to be maintained for the subsequent computations. Routine RSTAP can be used to compute summary statistics for the original polynomial model given the results from RFORP. Routine RCASP can be used to compute case statistics for the original polynomial model given the results from RFORP.

The degree of the polynomial can be specified, or the degree of the polynomial can be determined by RFORP under one of two criteria:

1. If some of the x values are repeated, the lowest degree polynomial can be fitted whose lack of fit is not significant at a specified level.

2. The lowest degree polynomial can be fitted with an R2 that meets a specified lower bound.

Routine RFORP is based on the algorithm of Forsythe (1957). A modification to Forsythe’s algorithm is made for the inclusion of weights (Kelly 1967, page 68).

Let xi be a value of the independent variable. The xi’s are scaled to the interval [‑2, 2] for computational accuracy. The scaled version of the independent variable is computed by the formula zi = mxi + c. The multiplicative scaling constant m (stored in SMULTC) is

The additive constant c (stored in SADDC) is

Orthogonal polynomials are evaluated using the three-term recurrence relationship

beginning with the initial polynomials

The aj’s and bj’s (stored in A and B) are computed to make the pj(z)’s orthogonal with respect to the the set of weights wi, and over the set zi.

The fitted model is

The

(stored in SCOEF) are computed (Shampine 1975) by

where ei = yi ‑ pj−1(zi) and

The dj’s (stored in D) can be used to compute the sum of squares due to the j-th orthogonal polynomial by

A more complete description of Forsythe’s algorithm and the modification of Shampine appears in Kennedy and Gentle (1980, pages 342‑347).

Comments

1. Workspace may be explicitly provided, if desired, by use of R2ORP/DR2ORP. The reference is:

CALL R2ORP (NOBS, NCOL, X, LDX, IRSP, IND, IFRQ, IWT, MAXDEG, ICRIT, CRIT, LOF, NDEG, SMULTC, SADDC, A, B, SCOEF, D, DFE, SSE, DFPE, SSPE, NRMISS, WK, IWK)

The additional arguments are as follows:

WK — Work vector of length 8 * NOBS.

IWK — Work vector of length NOBS.

2. Informational errors

Type	Code	Description
3	4	The response variable is constant. A zero order polynomial is fit. High order coefficients are set to zero.
3	5	There are too few observations to fit the desired degree polynomial. High order coefficients are set to zero.
3	6	A perfect fit is obtained with a polynomial of lower degree than MAXDEG.
4	1	An invalid weight is encountered.
4	2	An invalid frequency is encountered.
4	3	Each row of X contains a missing value.
4	7	The independent variable is constant. At least two distinct settings of the independent variable are needed.

3. The orthogonal polynomials evaluated at each scaled x value (z) are computed from A and B as follows:

POLY(I, 1) = Z(I) ‑ A(1)

POLY(I, 2) = (Z(I) ‑ A(2)) * POLY(I, 1) ‑ B(2)

POLY (I, J) = (Z(I) ‑ A(J)) * POLY(I, J ‑ 1) ‑ B(J) * POLY(I, J ‑ 2)

for J = 3 through NDEG.

Example

A polynomial model is fitted to data discussed by Neter and Wasserman (1974, pages 279‑285). The data set contains the response variable y measuring coffee sales (in hundred gallons) and the number of self-service coffee dispensers. Responses for fourteen similar cafeterias are in the data set, some of the cafeterias have the same number of dispensers so that lack of fit of the model can be assessed.

USE RFORP_INT

USE UMACH_INT

USE WRRRN_INT

IMPLICIT NONE

INTEGER LDX, MAXDEG, NCOL, NOBS, J

PARAMETER (MAXDEG=2, NCOL=2, NOBS=14, LDX=NOBS)

INTEGER IND, IRSP, LOF, NDEG, NOUT, NRMISS

REAL A(MAXDEG), B(MAXDEG), CRIT, D(MAXDEG+1), DFE, DFPE, &

SADDC, SCOEF(MAXDEG+1), SMULTC, SSE, SSPE, X(LDX,NCOL)

DATA (X(1,J),J=1,2) /0.0, 508.1/

DATA (X(2,J),J=1,2) /5.0, 787.6/

DATA (X(3,J),J=1,2) /0.0, 498.4/

DATA (X(4,J),J=1,2) /1.0, 568.2/

DATA (X(5,J),J=1,2) /2.0, 651.7/

DATA (X(6,J),J=1,2) /7.0, 854.7/

DATA (X(7,J),J=1,2) /2.0, 657.0/

DATA (X(8,J),J=1,2) /4.0, 755.3/

DATA (X(9,J),J=1,2) /6.0, 831.8/

DATA (X(10,J),J=1,2) /4.0, 758.9/

DATA (X(11,J),J=1,2) /5.0, 792.1/

DATA (X(12,J),J=1,2) /6.0, 841.4/

DATA (X(13,J),J=1,2) /7.0, 871.4/

DATA (X(14,J),J=1,2) /1.0, 577.3/

IRSP = 2

IND = 1

LOF = 1

CALL RFORP (X, IRSP, IND, MAXDEG, NDEG, A, B, SCOEF, D, lof=lof, &

smultc=smultc, saddc=saddc, dfe=dfe, sse=sse, &

dfpe=dfpe, sspe=sspe, nrmiss=nrmiss)

CALL UMACH (2, NOUT)

WRITE (NOUT,*) 'NDEG = ', NDEG

CALL WRRRN ('A', A, 1, NDEG, 1)

CALL WRRRN ('B', B, 1, NDEG, 1)

WRITE (NOUT,*) 'SMULTC = ', SMULTC

WRITE (NOUT,*) 'SADDC = ', SADDC

CALL WRRRN ('SCOEF', SCOEF, 1, NDEG+1, 1)

CALL WRRRN ('D', D, 1, NDEG+1, 1)

WRITE (NOUT,*) 'DFE = ', DFE

WRITE (NOUT,*) 'SSE = ', SSE

WRITE (NOUT,*) 'DFPE = ', DFPE

WRITE (NOUT,*) 'SSPE = ', SSPE

WRITE (NOUT,*) 'NRMISS = ', NRMISS

END

Output

NDEG = 2

1 2

0.04082 -0.07996

1 2

0.000 1.946

SMULTC = 0.571429

SADDC = -2.00000

SCOEF

1 2 3

711.0 90.0 -12.2

1 2 3

14.00 27.24 29.69

DFE = 11.0000

SSE = 710.594

DFPE = 7.00000

SSPE = 304.626

NRMISS = 0