RLOFN
Computes a lack of fit test based on near replicates for a fitted regression model.
Required Arguments
X — NOBS by NCOL matrix containing the data. (Input)
IIND — Independent variable option. (Input)
IIND |
Meaning |
< 0 |
The first ‑IIND columns of X contain the independent (explanatory) variables. |
> 0 |
The IIND independent variables are specified by the column numbers in INDIND. |
= 0 |
There are no independent variables. |
There are NCOEF = INTCEP + ∣IIND∣ regressors—the intercept (if INTCEP = 1) and the independent variables.
INDIND — Index vector of length IIND containing the column numbers of X that are the independent variables. (Input, if IIND is positive)
If IIND is nonnegative, INDIND is not referenced and can be a vector of length one.
IRSP — Column number IRSP of X contains data for the response (dependent) variable. (Input)
B — Vector of length NCOEF containing a least-squares solution
for the regression coefficients. (Input)
R — NCOEF by NCOEF upper triangular matrix containing the R matrix. (Input)
The R matrix can come from a regression fit based on a QR decomposition of the matrix of regressors or based on a Cholesky factorization RTR of the matrix of sums of squares and crossproducts of the regressors. Elements to the right of a diagonal element of R that is zero must also be zero. A zero row indicates a nonfull rank model. For an R matrix that comes from a regression fit with linear equality restrictions on the parameters, each row of R corresponding to a restriction must have a corresponding diagonal element that is negative. The remaining rows of R must have positive diagonal elements. Only the upper triangle of R is referenced.
DFE — Degrees of freedom for error from the fitted regression. (Input)
SSE — Sum of squares for error from the fitted regression. (Input)
NGROUP — Number of groups. (Input)
A cluster analysis based on NGROUP groups is performed. A good choice for NGROUP is the number of groups of near replicates in the data set.
IGROUP — Vector of length NOBS specifying group numbers. (Input, if ICLUST = 0; Output, if ICLUST ≥ 1)
IGROUP(I) = J means row I of X is in the J-th group of near replicates (J = 0, 1, 2, …, NGROUP). Here, J = 0 indicates the group of observations not used in the analysis because NaN (not a number) was input for one or more of the values of the response, independent, frequency, or weight variables.
TESTLF — Vector of length 10 containing statistics relating to the test for lack of fit of the model. (Output)
Elem |
Description |
1 |
Degrees of freedom for lack of fit |
2 |
Degrees of freedom for error from the expanded model (one-way analysis of covariance model using clusters of near replicates as the groups). |
3 |
Degrees of freedom for error (DFE = TESTLF(1) + TESTLF(2)). |
4 |
Sum of squares for lack of fit. |
5 |
Sum of squares for error from the expanded model. |
6 |
Sum of squares for error (SSE = TESTLF(4) + TESTLF(5)). |
7 |
Mean square for lack of fit. |
8 |
Mean square for error from the expanded model. |
9 |
F statistic |
10 |
p‑value |
Optional Arguments
NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).
NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
INTCEP — Intercept option. (Input)
Default: INTCEP = 1.
INTCEP |
Action |
0 |
An intercept is not in the model. |
1 |
An intercept is in the model. |
IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies.
Default: IFRQ = 0.
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.
LDR — Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input)
Default: LDR = size (R,1).
ICLUST — Clustering option. (Input)
Default: ICLUST = 1.
ICLUST |
Meaning |
0 |
Cluster groups are input in IGROUP. |
1 |
Cluster groups are obtained using Euclidean distance. |
2 |
Cluster groups are obtained using Mahalanobis distance. |
MAXIT — Maximum number of iterations for the cluster analysis to determine near replicates. (Input, if ICLUST is positive, otherwise, MAXIT is not referenced)
MAXIT = 30 is usually sufficient for convergence.
Default: MAXIT = 30.
TOL — Tolerance used in determining linear dependence for the one-way analysis of covariance model using clusters as the groups. (Input)
TOL = EPS2∕3 is a good choice. For RLOFN, EPS = AMACH(4). See documentation for AMACH in Reference Material.
Default: TOL = 2.4e-5 for single precision and 3.6d – 11 for double precision.
FORTRAN 90 Interface
Generic: CALL RLOFN (X, IIND, INDIND, IRSP, B, R, DFE, SSE, NGROUP, IGROUP, TESTLF [, …])
Specific: The specific interface names are S_RLOFN and D_RLOFN.
FORTRAN 77 Interface
Single: CALL RLOFN (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND, IRSP, IFRQ, IWT, B, R, LDR, DFE, SSE, ICLUST, MAXIT, TOL, NGROUP, IGROUP, TESTLF)
Double: The double precision name is DRLOFN.
Description
Routine RLOFN computes a lack of fit test based on near replicates for a fitted regression model. The data need not be sorted prior to invoking RLOFN. The column indices of X for determining near replicates must correspond to the independent variables in the original fitted model. If the groups of near replicates are known prior to invoking RLOFN, the option ICLUST = 0 allows RLOFN to bypass the computation of the groups.
The data can contain missing values indicated by NaN. (NaN is AMACH(6). Routine AMACH is described in the section “Machine-Dependent Constants” in the Reference Material.) For ICLUST equal to 1 or 2, any row of X containing NaN as a value for the response, weight, frequency, or independent variables is omitted from the analysis. For ICLUST equal to 0, if the i-th row of X contains NaN for one of the variables in the analysis, the i-th element of IGROUP must be 0 on input.
Routine KMEAN (see Chapter 11, “Cluster Analysis”) is used to compute k clusters or groups of near replicates. Prior to invoking KMEAN, a detached sort of the independent variables in the regression model is performed using routine SROWR (See Chapter 19, “Utilities”.) If there are fewer than NGROUP distinct observations, a warning message is issued and k is set equal to the number of distinct observations. Otherwise, k equals NGROUP. For purposes of the cluster analysis, ICLUST = 1 specifies Euclidean distance and ICLUST = 2 specifies Mahalanobis distance. For Mahalanobis distance, the data are transformed before invoking KMEAN so that the Euclidean metric applied by KMEAN for the transformed data is equivalent to the sample Mahalanobis distance for the original (untransformed) data.
Let X be the n × p matrix of regressors, and let R be the upper triangular matrix computed from the fitted regression model. The matrix R can be computed by routines RGLM, RGIVN, or RLEQU for fitting the regression model. A linear equality restriction on the regression parameters corresponds to a row of R with a negative diagonal element. Let D be a p × p diagonal matrix with diagonal elements
Let
be the i-th row of X, and let ti = Dsi where si satisfies
RTsi = xi
Then, the Mahalanobis distance from xi to xj equals the Euclidean distance from ti to tj because
Once the clusters are identified by KMEAN an expanded regression model—a one-way analysis of covariance model–is fitted to the original (untransformed) data. Denote the original model by y = X β + ɛ and the expanded model by y = X β + Z γ + ɛ. The added regressors that are contained in the n × k matrix Z in the expanded model are indicator variables specifying cluster membership. The lack of fit test that is computed is an exact test of the hypothesis that γ = 0 in the expanded model. This test was proposed as a lack of fit test by Christensen (1989).
Let SSE(X, Z) be the error sum of squares from the fit of the expanded model and let SSE(X) be the error sum of squares from the fit of the original model. The lack of fit sum of squares is SSE(X) ‑ SSE(X, Z) and the lack of fit degrees of freedom are DFE(X) ‑ DFE(X, Z). The F statistic for the test of the null hypothesis of no lack of fit is
Under the hypothesis of no lack of fit, the computed F has an F distribution with numerator and denominator degrees of freedom DFE(X) ‑ DFE(X, Z) and DFE(X, Z), respectively. The p‑value for the test is computed as the probability that a random variable with this distribution is greater than or equal to the computed F statistic.
The error degrees of freedom and error sum of squares from the fit of the expanded model are computed as the error degrees of freedom and sum of squares from the reduced model where Z and y have been adjusted for X. Routine RCOV is used to fit the reduced model. Let e be the vector of residuals from the original fitted model, let W be the diagonal matrix whose i-th diagonal element is the product of the weight and frequency for the i-th observation. The sum of squares and crossproducts matrix for the adjusted Z and y in the reduced model, which is input into RCOV, is
where A is a solution of RTA = DXTW Z.
Comments
1. Workspace may be explicitly provided, if desired, by use of R2OFN/DR2OFN. The reference is:
CALL R2OFN (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND, IRSP, FRQ, IWT, B, R, LDR, DFE, SSE, ICLUST, MAXIT, TOL, NGROUP, IGROUP, TESTLF, IWK, WK)
The additional arguments are as follows:
IWK — Work array of length 3 * NOBS + ∣IIND∣ + NGROUP + 3 + max{m + 2.8854 * ln(m) + 2, 3 * NGROUP, NCOEF}, if ICLUST is positive. If ICLUST = 0, IWK can be an array of length 1.
WK — Work array of length LWK.
2. Informational errors
Type |
Code |
Description |
3 |
1 |
Convergence did not occur in the cluster analysis for the lack of fit test within MAXIT iterations. Better results may be obtained by increasing MAXIT. |
4 |
2 |
An invalid weight or frequency is encountered. Weights and frequencies must be nonnegative. |
3 |
3 |
The matrix of sum of squares and crossproducts computed for the within cluster model for testing lack of fit is not nonnegative definite within the tolerance defined by TOL. |
4 |
4 |
At least one element in the columns containing the independent variables, IRSP, IFRQ, or IWT of X contains NaN (not a number), but the corresponding element in IGROUP is not zero. When ICLUST = 0, missing values in a row of X are indicated by setting the corresponding row of IGROUP to zero. |
Examples
Example 1
This example uses data from Draper and Smith (1981, page 374), which is input in X. A multiple linear regression of column 6 of X on an intercept and columns 1, 3, and 4 is computed using routine RGIVN. Tests for lack of fit are computed for choices of NGROUP equal to 4 and 6 using routine RLOFN. Note that for NGROUP equal to 6 the results are exactly the same as for routine RLOFE. (If there are exact replicates in the data and the number of clusters used by RLOFN equals the number of distinct cases of the independent variables, then RLOFN and RLOFE produce the same output.)
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS, J, INTCEP
PARAMETER (INTCEP=1, NCOL=6, NDEP=1, NIND=3, NOBS=20, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF, &
LDR=NCOEF)
!
INTEGER ICLUST, IDEP, IGROUP(NOBS), IIND, INDDEP(NDEP), &
INDIND(NIND), IRSP, NGROUP, NOUT, NRMISS, NROW
REAL B(LDB,NDEP), DFE, R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &
SSE, TESTLF(10), X(LDX,NCOL)
!
DATA (X(1,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 246.0/
DATA (X(2,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(3,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 253.0/
DATA (X(4,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 164.0/
DATA (X(5,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 203.0/
DATA (X(6,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 173.0/
DATA (X(7,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 210.0/
DATA (X(8,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(9,J),J=1,6)/0.0, 1.0, 0.0, 1.0, 0.0, 120.0/
DATA (X(10,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 171.0/
DATA (X(11,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 167.0/
DATA (X(12,J),J=1,6)/0.0, 0.0, 1.0, 1.0, 0.0, 172.0/
DATA (X(13,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(14,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(15,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 248.0/
DATA (X(16,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 169.0/
DATA (X(17,J),J=1,6)/0.0, 1.0, 0.0, 0.0, 0.0, 104.0/
DATA (X(18,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 166.0/
DATA (X(19,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 168.0/
DATA (X(20,J),J=1,6)/0.0, 1.0, 1.0, 0.0, 0.0, 148.0/
DATA INDIND/1, 3, 4/, INDDEP/6/
!
NROW = NOBS
IIND = NIND
IDEP = NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
SSE = SCPE(1,1)
IRSP = 6
ICLUST = 2
DO 10 NGROUP=4, 6, 2
CALL RLOFN (X, IIND, INDIND, IRSP, B(1:, 1), R, DFE, SSE, NGROUP, &
IGROUP, TESTLF, ICLUST=ICLUST)
CALL UMACH (2, NOUT)
WRITE (NOUT,*) ' '
WRITE (NOUT,*) 'NGROUP = ', NGROUP
CALL WRIRN ('IGROUP', IGROUP, 1, NOBS, 1)
WRITE (NOUT,*) ' '
WRITE (NOUT,99999) ' Test for Lack of '// &
'Fit'
WRITE (NOUT,99999) ' Sum of Mean '// &
' Prob. of'
WRITE (NOUT,99999) ' Source of Error DF Squares Square '// &
' F Larger F'
WRITE (NOUT,99999) ' Lack of Fit ', TESTLF(1), TESTLF(4), &
TESTLF(7), TESTLF(9), TESTLF(10)
WRITE (NOUT,99999) ' Expanded model ', TESTLF(2), TESTLF(5), &
TESTLF(8)
WRITE (NOUT,99999) ' Original model ', TESTLF(3), TESTLF(6)
10 CONTINUE
99999 FORMAT (A, F5.1, F9.1, F8.2, F7.3, F10.3)
END
Output
NGROUP = 4
IGROUP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4 4 4 4 2 4 2 4 2 4 4 4 4 4 4 4 1 4 4 3
Test for Lack of Fit
Sum of Mean Prob. of
Source of Error DF Squares Square F Larger F
Lack of Fit 1.0 0.4 0.38 0.035 0.855
Expanded model 15.0 163.6 10.90
Original model 16.0 163.9
NGROUP = 6
IGROUP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
6 6 6 4 5 4 5 6 2 4 4 4 6 6 6 4 1 4 4 3
Test for Lack of Fit
Sum of Mean Prob. of
Source of Error DF Squares Square F Larger F
Lack of Fit 2.0 20.5 10.25 1.001 0.393
Expanded model 14.0 143.4 10.24
Original model 16.0 163.9
Example 2
This example uses the same data and model from Example 1. Here, the option ICLUST = 0 is input so that the group numbers for performing the lack of fit test are input.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS, J, INTCEP
PARAMETER (INTCEP=1, NCOL=6, NDEP=1, NIND=3, NOBS=20, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF, &
LDR=NCOEF)
!
INTEGER ICLUST, IDEP, IGROUP(NOBS), IIND, &
INDDEP(NDEP), INDIND(NIND), IRSP, &
NGROUP, NOUT
REAL B(LDB,NDEP), DFE, R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &
SSE, TESTLF(10), TOL, X(LDX,NCOL), &
XMAX(NCOEF), XMIN(NCOEF)
!
DATA (X(1,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 246.0/
DATA (X(2,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(3,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 253.0/
DATA (X(4,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 164.0/
DATA (X(5,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 203.0/
DATA (X(6,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 173.0/
DATA (X(7,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 210.0/
DATA (X(8,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(9,J),J=1,6)/0.0, 1.0, 0.0, 1.0, 0.0, 120.0/
DATA (X(10,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 171.0/
DATA (X(11,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 167.0/
DATA (X(12,J),J=1,6)/0.0, 0.0, 1.0, 1.0, 0.0, 172.0/
DATA (X(13,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(14,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(15,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 248.0/
DATA (X(16,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 169.0/
DATA (X(17,J),J=1,6)/0.0, 1.0, 0.0, 0.0, 0.0, 104.0/
DATA (X(18,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 166.0/
DATA (X(19,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 168.0/
DATA (X(20,J),J=1,6)/0.0, 1.0, 1.0, 0.0, 0.0, 148.0/
DATA INDIND/1, 3, 4/, INDDEP/6/
DATA IGROUP/4*4, 2, 4, 2, 4, 2, 7*4, 1, 2*4, 3/
!
IIND = NIND
IDEP = NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
SSE = SCPE(1,1)
IRSP = 6
ICLUST = 0
NGROUP = 4
CALL RLOFN (X, IIND, INDIND, IRSP, B(1:, 1), R, DFE, SSE, NGROUP, &
IGROUP, TESTLF, iclust=iclust)
CALL UMACH (2, NOUT)
WRITE (NOUT,*) ' '
WRITE (NOUT,*) 'NGROUP = ', NGROUP
CALL WRIRN ('IGROUP', IGROUP, 1, NOBS, 1)
WRITE (NOUT,*) ' '
WRITE (NOUT,99999) ' Test for Lack of '// &
'Fit'
WRITE (NOUT,99999) ' Sum of Mean '// &
' Prob. of'
WRITE (NOUT,99999) ' Source of Error DF Squares Square '// &
' F Larger F'
WRITE (NOUT,99999) ' Lack of Fit ', TESTLF(1), TESTLF(4),&
TESTLF(7), TESTLF(9), TESTLF(10)
WRITE (NOUT,99999) ' Expanded model ', TESTLF(2), TESTLF(5),&
TESTLF(8)
WRITE (NOUT,99999) ' Original model ', TESTLF(3), TESTLF(6)
99999 FORMAT (A, F5.1, F9.1, F8.2, F7.3, F10.3)
END
Output
NGROUP = 4
IGROUP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4 4 4 4 2 4 2 4 2 4 4 4 4 4 4 4 1 4 4 3
Test for Lack of Fit
Sum of Mean Prob. of
Source of Error DF Squares Square F Larger F
Lack of Fit 1.0 0.4 0.38 0.035 0.855
Expanded model 15.0 163.6 10.90
Original model 16.0 163.9