RFORP
Fits an orthogonal polynomial regression model.
Required Arguments
X — NOBS by NCOL matrix containing the data. (Input)
IRSP — Column number IRSP of X contains the data for the response (dependent) variable. (Input)
IND — Column number IND of X contains the data for the independent (explanatory) variable. (Input)
MAXDEG — Maximum degree of polynomial to be fit. (Input)
NDEG — Degree of final polynomial regression. (Output)
A — Vector of length MAXDEG containing constants used to generate orthogonal polynomials. (Output)
Only the first NDEG elements of A are referenced.
B — Vector of length MAXDEG containing constants used to generate orthogonal polynomials. (Output)
Only the first NDEG elements of B are referenced.
SCOEF — Vector of length 1 + MAXDEG containing the regression coefficients α of the fitted model using the scaled version of x(z). (Output)
Only the first 1 + NDEG elements of SCOEF are referenced.
is the estimated intercept and equals the response mean.
contains the estimated coefficient for the i-th order orthogonal polynomial using the scaled version of x(z).
D — Vector of length MAXDEG + 1 containing the diagonal elements of the (diagonal) sums of squares and crossproducts matrix. (Output)
The sum of squares due to the i-th degree orthogonal polynomial is given by
Only the first NDEG + 1 elements of D are referenced.
Optional Arguments
NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).
NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies. If X(i, IFRQ) = 0.0, none of the remaining elements of row i of X are referenced, and updating of statistics is skipped for row i.
Default: IFRQ = 0.
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.
ICRIT — Criterion option. (Input)
Default: ICRIT = 0.
ICRIT |
Meaning |
0 |
Fit a MAXDEG-th degree polynomial. |
1 |
Fit the lowest degree polynomial with an R2 (in percent) of at least CRIT. |
2 |
Fit the lowest degree polynomial with a lack-of-fit F test not significant at level CRIT percent. |
CRIT — Criterion in percent. (Input, if ICRIT = 1 or ICRIT = 2)
Default: CRIT = 95.0.
ICRIT |
Meaning of CRIT |
1 |
R2 (in percent) that the fitted polynomial must achieve. A common choice is 95.0. |
2 |
Significance level (in percent) for the lack-of-fit test that the fitted polynomial must not exceed. A common choice is 5.0. |
LOF — Lack-of-fit option. (Input)
If ICRIT = 2, LOF must equal 1.
Default: LOF = 0.
LOF |
Action |
0 |
DFPE and SSPE are not computed. |
1 |
DFPE and SSPE are computed. |
SMULTC — Multiplicative constant used to compute a scaled version of x, say z, on the interval ‑2 to 2, inclusive. (Output)
SADDC — Additive constant used to compute a scaled version of x(z) on the interval ‑2 to 2, inclusive. (Output)
DFE — Degrees of freedom for error. (Output)
SSE — Sum of squares for error. (Output)
DFPE — Degrees of freedom for pure error. (Output, if LOF = 1)
SSPE — Sum of squares for pure error. (Output, if LOF = 1)
NRMISS — Number of rows of data encountered that contain any missing values for the independent, response, weight, or frequency variables. (Output)
NaN (not a number) is used as the missing value code. Any row of X containing NaN as a value of the independent, response, weight, or frequency variables is omitted from the fit.
FORTRAN 90 Interface
Generic: CALL RFORP (X, IRSP, IND, MAXDEG, NDEG, A, B, SCOEF, D [, …])
Specific: The specific interface names are S_RFORP and D_RFORP.
FORTRAN 77 Interface
Single: CALL RFORP (NOBS, NCOL, X, LDX, IRSP, IND, IFRQ, IWT, MAXDEG, ICRIT, CRIT, LOF, NDEG, SMULTC, SADDC, A, B, SCOEF, D, DFE, SSE, DFPE, SSPE, NRMISS)
Double: The double precision name is DRFORP.
Description
Routine RFORP computes estimates of the regression coefficients in a polynomial regression model using orthogonal polynomials. The reparameterization of the polynomial model in terms of orthogonal polynomials has the advantage that the loss of accuracy resulting from forming powers of the x values is avoided. The design of RFORP assumes that further computations such as summary statistics or case statistics are needed. For this reason, the results returned by RFORP are for the reparameterized model in terms of orthogonal polynomials. This enables computational accuracy to be maintained for the subsequent computations. Routine RSTAP can be used to compute summary statistics for the original polynomial model given the results from RFORP. Routine RCASP can be used to compute case statistics for the original polynomial model given the results from RFORP.
The degree of the polynomial can be specified, or the degree of the polynomial can be determined by RFORP under one of two criteria:
1. If some of the x values are repeated, the lowest degree polynomial can be fitted whose lack of fit is not significant at a specified level.
2. The lowest degree polynomial can be fitted with an R2 that meets a specified lower bound.
Routine RFORP is based on the algorithm of Forsythe (1957). A modification to Forsythe’s algorithm is made for the inclusion of weights (Kelly 1967, page 68).
Let xi be a value of the independent variable. The xi’s are scaled to the interval [‑2, 2] for computational accuracy. The scaled version of the independent variable is computed by the formula zi = mxi + c. The multiplicative scaling constant m (stored in SMULTC) is
The additive constant c (stored in SADDC) is
Orthogonal polynomials are evaluated using the three-term recurrence relationship
beginning with the initial polynomials
The aj’s and bj’s (stored in A and B) are computed to make the pj(z)’s orthogonal with respect to the the set of weights wi, and over the set zi.
The fitted model is
The
(stored in SCOEF) are computed (Shampine 1975) by
where ei = yi ‑ pj−1(zi) and
The dj’s (stored in D) can be used to compute the sum of squares due to the j-th orthogonal polynomial by
A more complete description of Forsythe’s algorithm and the modification of Shampine appears in Kennedy and Gentle (1980, pages 342‑347).
Comments
1. Workspace may be explicitly provided, if desired, by use of R2ORP/DR2ORP. The reference is:
CALL R2ORP (NOBS, NCOL, X, LDX, IRSP, IND, IFRQ, IWT, MAXDEG, ICRIT, CRIT, LOF, NDEG, SMULTC, SADDC, A, B, SCOEF, D, DFE, SSE, DFPE, SSPE, NRMISS, WK, IWK)
The additional arguments are as follows:
WK — Work vector of length 8 * NOBS.
IWK — Work vector of length NOBS.
2. Informational errors
Type |
Code |
Description |
3 |
4 |
The response variable is constant. A zero order polynomial is fit. High order coefficients are set to zero. |
3 |
5 |
There are too few observations to fit the desired degree polynomial. High order coefficients are set to zero. |
3 |
6 |
A perfect fit is obtained with a polynomial of lower degree than MAXDEG. |
4 |
1 |
An invalid weight is encountered. |
4 |
2 |
An invalid frequency is encountered. |
4 |
3 |
Each row of X contains a missing value. |
4 |
7 |
The independent variable is constant. At least two distinct settings of the independent variable are needed. |
3. The orthogonal polynomials evaluated at each scaled x value (z) are computed from A and B as follows:
POLY(I, 1) = Z(I) ‑ A(1)
POLY(I, 2) = (Z(I) ‑ A(2)) * POLY(I, 1) ‑ B(2)
POLY (I, J) = (Z(I) ‑ A(J)) * POLY(I, J ‑ 1) ‑ B(J) * POLY(I, J ‑ 2)
for J = 3 through NDEG.
Example
A polynomial model is fitted to data discussed by Neter and Wasserman (1974, pages 279‑285). The data set contains the response variable y measuring coffee sales (in hundred gallons) and the number of self-service coffee dispensers. Responses for fourteen similar cafeterias are in the data set, some of the cafeterias have the same number of dispensers so that lack of fit of the model can be assessed.
USE RFORP_INT
USE UMACH_INT
USE WRRRN_INT
IMPLICIT NONE
INTEGER LDX, MAXDEG, NCOL, NOBS, J
PARAMETER (MAXDEG=2, NCOL=2, NOBS=14, LDX=NOBS)
!
INTEGER IND, IRSP, LOF, NDEG, NOUT, NRMISS
REAL A(MAXDEG), B(MAXDEG), CRIT, D(MAXDEG+1), DFE, DFPE, &
SADDC, SCOEF(MAXDEG+1), SMULTC, SSE, SSPE, X(LDX,NCOL)
!
DATA (X(1,J),J=1,2) /0.0, 508.1/
DATA (X(2,J),J=1,2) /5.0, 787.6/
DATA (X(3,J),J=1,2) /0.0, 498.4/
DATA (X(4,J),J=1,2) /1.0, 568.2/
DATA (X(5,J),J=1,2) /2.0, 651.7/
DATA (X(6,J),J=1,2) /7.0, 854.7/
DATA (X(7,J),J=1,2) /2.0, 657.0/
DATA (X(8,J),J=1,2) /4.0, 755.3/
DATA (X(9,J),J=1,2) /6.0, 831.8/
DATA (X(10,J),J=1,2) /4.0, 758.9/
DATA (X(11,J),J=1,2) /5.0, 792.1/
DATA (X(12,J),J=1,2) /6.0, 841.4/
DATA (X(13,J),J=1,2) /7.0, 871.4/
DATA (X(14,J),J=1,2) /1.0, 577.3/
!
IRSP = 2
IND = 1
LOF = 1
CALL RFORP (X, IRSP, IND, MAXDEG, NDEG, A, B, SCOEF, D, lof=lof, &
smultc=smultc, saddc=saddc, dfe=dfe, sse=sse, &
dfpe=dfpe, sspe=sspe, nrmiss=nrmiss)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,*) 'NDEG = ', NDEG
CALL WRRRN ('A', A, 1, NDEG, 1)
CALL WRRRN ('B', B, 1, NDEG, 1)
WRITE (NOUT,*) 'SMULTC = ', SMULTC
WRITE (NOUT,*) 'SADDC = ', SADDC
CALL WRRRN ('SCOEF', SCOEF, 1, NDEG+1, 1)
CALL WRRRN ('D', D, 1, NDEG+1, 1)
WRITE (NOUT,*) 'DFE = ', DFE
WRITE (NOUT,*) 'SSE = ', SSE
WRITE (NOUT,*) 'DFPE = ', DFPE
WRITE (NOUT,*) 'SSPE = ', SSPE
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
Output
NDEG = 2
A
1 2
0.04082 -0.07996
B
1 2
0.000 1.946
SMULTC = 0.571429
SADDC = -2.00000
SCOEF
1 2 3
711.0 90.0 -12.2
D
1 2 3
14.00 27.24 29.69
DFE = 11.0000
SSE = 710.594
DFPE = 7.00000
SSPE = 304.626
NRMISS = 0