ROTIN
Computes diagnostics for detection of outliers and influential data points given residuals and the R matrix for a fitted general linear model.
Required Arguments
X — NRX by NCOL matrix containing the data. (Input)
IIND — Independent variable option. (Input)
The absolute value of IIND is the number of independent (explanatory) variables. The sign of IIND specifies the following options:
IIND |
Meaning |
< 0 |
The data for the ‑IIND independent variables are given in the first ‑IIND columns of X. |
> 0 |
The data for the IIND independent variables are in the columns of X whose column numbers are given by the elements of INDIND. |
= 0 |
There are no independent variables. |
The regressors are the constant regressor (if INTCEP = 1) and the independent variables.
INDIND — Index vector of length IIND containing the column numbers of X that are the independent (explanatory) variables. (Input, if IIND is positive)
If IIND is nonpositive, INDIND is not referenced and can be a vector of length one.
R — INTCEP + ∣IIND∣ by INTCEP + ∣IIND∣ upper triangular matrix containing the R matrix. (Input)
The R matrix can come from a regression fit based on a QR decomposition of the matrix of regressors or based on a Cholesky factorization RTR of the matrix of sums of squares and crossproducts of the regressors. Elements to the right of a diagonal element of R that is zero must also be zero. A zero row indicates a nonfull rank model. For an R matrix that comes from a regression fit with linear equality restrictions on the parameters, each row of R corresponding to a restriction must have a corresponding diagonal element that is negative. The remaining rows of R must have positive diagonal elements.
DFE — Degrees of freedom for error. (Input)
SSE — Sum of squares for error. (Input)
E — Vector of length NRX with the residuals. (Input)
If a residual is not known, e.g., the value for the dependent (response) variable was missing, the input value of the corresponding element of E should equal NaN (not a number).
OTIN — NRX by 6 matrix containing diagnostics for detection of outliers and influential cases. (Output)
The columns of OTIN contain the following:
Col. |
Description |
1 |
Residual |
2 |
Leverage (diagonal element of the ‘Hat’ matrix) |
3 |
Standardized residual |
4 |
Jackknife (deleted) residual |
5 |
Cook’s Distance |
6 |
DFFITS |
Optional Arguments
NRX — Number of rows of data. (Input)
Default: NRX = size (X,1).
NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
INTCEP — Intercept option. (Input)
Default: INTCEP = 1.
INTCEP |
Action |
0 |
An intercept is not in the model. |
1 |
An intercept is in the model. |
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.
LDR — Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input)
Default: LDR = size (R,1).
LDOTIN — Leading dimension of OTIN exactly as specified in the dimension statement in the calling program. (Input)
Default: LDOTIN = size (OTIN,1).
NRMISS — Number of rows of OTIN containing NaN (not a number). (Output)
If any row of data contains NaN as a value of the independent variable or weight, elements in columns 2 thru 6 of the corresponding row in OTIN are set to NaN. If the residual is missing, elements in columns 3 thru 6 are set to NaN.
FORTRAN 90 Interface
Generic: CALL ROTIN (X, IIND, INDIND, R, DFE, SSE, E, OTIN [, …])
Specific: The specific interface names are S_ROTIN and D_ROTIN.
FORTRAN 77 Interface
Single: CALL ROTIN (NRX, NCOL, X, LDX, INTCEP, IIND, INDIND, IWT, R, LDR, DFE, SSE, E, OTIN, LDOTIN, NRMISS)
Double: The double precision name is DROTIN.
Description
The multiple regression model used by routine ROTIN is
y = X β + ɛ
where y is the n × 1 vector of responses, X is the n × p matrix of regressors, β is the p × 1 vector of regression coefficients, and ɛ is the n × 1 vector of errors whose elements are independently normally distributed with mean 0 and variance σ2/wi. The model used by ROTIN also permits linear equality restrictions on β. From a multiple regression model fit using the wi’s as the weights, routine ROTIN computes diagnostics for outliers and influential cases. Let xi be a column vector containing elements of the i-th row of X. Let W = diag(w1, w2, …, wn). The leverage is defined as
(In the case of linear equality restrictions on β, the leverage is defined in terms of the reduced model.) Put D = diag(d1, d2, …, dp) with dj = 1 if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as hi = (aTDa)wi where a is a solution to RTa = xi. The computation of the remainder of the case diagnostics follows easily from their definitions. See the Diagnostics for Individual Cases section in the chapter introduction for definitions of the case diagnostics.
The type 3 informational errors can occur if the input variables X, R, E and SSE are not consistent with each other or if excessive rounding has occurred in their computation.The type 3 error message with error code 2 arises when X contains a row not in the space spanned by the rows of R. An examination of the model that was fitted and the X for which diagnostics are to be computed is required in order to insure that only linear combinations of the regression coefficients that can be estimated from the fitted model are specified in X. For further details, see the discussion of estimable functions given by Maindonald (1984, pages 166‑168) and Searle (1971, pages 180‑188).
Comments
1. Workspace may be explicitly provided, if desired, by use of R2TIN/DR2TIN. The reference is:
CALL R2TIN (NRX, NCOL, X, LDX, INTCEP, IIND, INDIND, IWT, R, LDR, DFE, SSE, E, OTIN, LDOTIN, NRMISS, WK)
The additional argument is:
WK — Work vector of length INTCEP + ∣IIND∣.
2. Informational errors
Type |
Code |
Description |
3 |
2 |
The linear combination of the regression coefficients specified is not estimable within the preset tolerance. |
3 |
3 |
A leverage much greater than 1.0 was computed. It is set to 1.0. |
3 |
4 |
A deleted residual mean square much less than 0.0 was computed. It is set to 0.0. |
4 |
1 |
A weight is negative. Weights must be nonnegative. |
Examples
Example 1
A multiple linear regression model is fit and case statistics computed for data discussed by Cook and Weisberg (1982, page 103). The fitted model is
Some of the statistics in row 6 of the output matrix OTIN are undefined (0.0/0.0) and are set to NaN (not a number). Some statistics in row 4 of OTIN are infinite and are set to machine infinity. The values of NaN and machine infinity can be retrieved by routine AMACH (or DMACH when using double precision), which is documented in Reference Material.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER INTCEP, LDB, LDOTIN, LDR, LDSCPE, LDX, NCOEF, NCOL, &
NDEP, NIND, NROW, J
PARAMETER (INTCEP=1, NCOL=3, NDEP=1, NIND=2, NROW=7, &
LDOTIN=NROW, LDSCPE=NDEP, LDX=NROW, &
NCOEF=INTCEP+NIND, LDB=NCOEF, LDR=NCOEF)
!
INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), NOUT, NRMISS
REAL B(LDB,NDEP), D(NCOEF), DFE, E(NROW), &
OTIN(LDOTIN,6), R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &
SSE, X(LDX,NCOL), XMAX(NCOEF), XMIN(NCOEF)
CHARACTER CLABEL(7)*10, RLABEL(1)*6
!
DATA CLABEL/'Obs.', 'Residual', 'Leverage', 'Std. Res.', &
'Jack. Res.', 'Cook''s D', 'DFFITS'/
DATA RLABEL/'NUMBER'/
!
DATA (X(1,J),J=1,NIND+NDEP) /1.0, 1.0, 3.0/
DATA (X(2,J),J=1,NIND+NDEP) /1.0, 2.0, 4.0/
DATA (X(3,J),J=1,NIND+NDEP) /1.0, 3.0, 5.0/
DATA (X(4,J),J=1,NIND+NDEP) /1.0, 4.0, 7.0/
DATA (X(5,J),J=1,NIND+NDEP) /1.0, 5.0, 7.0/
DATA (X(6,J),J=1,NIND+NDEP) /0.0, 6.0, 8.0/
DATA (X(7,J),J=1,NIND+NDEP) /1.0, 7.0, 9.0/
!
IIND = -NIND
IDEP = -NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
SSE = SCPE(1,1)
! Compute residuals.
DO 10 I=1, NROW
E(I) = X(I,NCOL) - B(1,1) - SDOT(NIND, B((INTCEP+1): ,1), &
1, X(I:, 1), LDX)
10 CONTINUE
!
CALL ROTIN (X, IIND, INDIND, R, DFE, SSE, E, OTIN, &
NRMISS=NRMISS)
!
CALL WRRRL ('OTIN', OTIN, RLABEL, CLABEL, FMT='(F10.3)')
CALL UMACH (2, NOUT)
WRITE (NOUT,*) 'NRMISS = ', NRMISS
!
END
Output
OTIN
Obs. Residual Leverage Std. Res. Jack. Res. Cook’s D DFFITS
1 -0.129 0.471 -0.389 -0.343 0.045 -0.324
2 -0.143 0.286 -0.371 -0.327 0.018 -0.207
3 -0.157 0.186 -0.383 -0.338 0.011 -0.161
4 0.829 0.171 2.000 Inf 0.276 Inf
5 -0.186 0.243 -0.469 -0.418 0.024 -0.237
6 0.000 1.000 NaN NaN NaN NaN
7 -0.214 0.643 -0.788 -0.742 0.372 -0.996
NRMISS = 1
Example 2
In this example, routine RNLIN is first invoked to fit the following nonlinear regression model discussed by Neter, Wasserman, and Kutner (1983, pages 475‑478):
Then, ROTIN is used to compute case diagnostics. In addition, the leverage output by ROTIN is used to construct asymptotic confidence intervals on the mean of the nonlinear regression function evaluated at xi. The asymptotic 95% confidence intervals are computed using the formula:
where hi is the computed leverage, t.975,DFE is the 97.5 percentile of the t distribution with DFE degrees of freedom as computed by routine TIN (see Chapter 17, “Probability Distribution Functions and Inverses”), and s2 equals SSE/DFE.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDOTIN, LDR, NOBS, NPARM, NRX
PARAMETER (NOBS=15, NPARM=2, NRX=1, LDOTIN=NRX, LDR=NPARM)
!
INTEGER IDERIV, IDUMMY(1), IEND, IOBS, IRANK, J, NOUT, NRMISS
REAL A, DE(NPARM, 1), DFE, E(1), FRQ, OTIN(LDOTIN,6), &
R(LDR,NPARM), SQRT, SSE, THETA(NPARM), WT, Y, &
YHAT
INTRINSIC SQRT
EXTERNAL EXAMPL
!
DATA THETA/60.0, -0.03/
!
CALL UMACH (2, NOUT)
!
IDERIV = 1
CALL RNLIN (EXAMPL, THETA, R=R, DFE=DFE, SSE=SSE)
!
WRITE (NOUT,*) ' Obs. Pred. Res. Lev. St Res Del Res Cook '// &
'D DFFIT Conf Interval'
DO 10 IOBS=1, NOBS
CALL EXAMPL (NPARM, THETA, 0, IOBS, FRQ, WT, E, DE, IEND)
CALL EXAMPL (NPARM, THETA, 1, IOBS, FRQ, WT, E, DE, IEND)
CALL EXAMPL (NPARM, THETA, 2, IOBS, FRQ, WT, Y, DE, IEND)
YHAT = Y - E(1)
CALL ROTIN (DE, -NPARM, IDUMMY, R, DFE, SSE, E, OTIN, &
NRX=NRX, LDX=1, INTCEP=0)
A = TIN(0.975,DFE)*SQRT((SSE/DFE)*OTIN(1,2))
WRITE (NOUT,'(F5.1,10F7.2)') Y, YHAT, (OTIN(1,J),J=1,6), &
YHAT - A, YHAT + A
10 CONTINUE
END
!
SUBROUTINE EXAMPL (NPARM, THETA, IOPT, IOBS, FRQ, WT, E, DE, &
IEND)
INTEGER NPARM, IOPT, IOBS, IEND
REAL THETA(NPARM), FRQ, WT, E(1), DE(NPARM, 1)
!
INTEGER NOBS
PARAMETER (NOBS=15)
!
REAL EXP, XDATA(NOBS), YDATA(NOBS)
INTRINSIC EXP
!
DATA YDATA/54.0, 50.0, 45.0, 37.0, 35.0, 25.0, 20.0, 16.0, 18.0, &
13.0, 8.0, 11.0, 8.0, 4.0, 6.0/
DATA XDATA/2.0, 5.0, 7.0, 10.0, 14.0, 19.0, 26.0, 31.0, 34.0, &
38.0, 45.0, 52.0, 53.0, 60.0, 65.0/
!
IF (IOBS .LE. NOBS) THEN
WT = 1.0E0
FRQ = 1.0E0
IEND = 0
IF (IOPT .EQ. 0) THEN
E(1) = YDATA(IOBS) - THETA(1)*EXP(THETA(2)*XDATA(IOBS))
ELSE IF (IOPT .EQ. 1) THEN
DE(1, 1) = -EXP(THETA(2)*XDATA(IOBS))
DE(2, 1) = -THETA(1)*XDATA(IOBS)*EXP(THETA(2)*XDATA(IOBS))
ELSE IF (IOPT .EQ. 2) THEN
E(1) = YDATA(IOBS)
END IF
ELSE
IEND = 1
END IF
RETURN
END
Output
Obs. Pred. Res. Lev. St Res Del Res Cook D DFFIT Conf Interval
54.0 54.15 -0.14 0.40 -0.09 -0.09 0.00 -0.07 51.19 56.53
50.0 48.08 1.92 0.24 1.13 1.14 0.21 0.65 49.84 54.00
45.0 44.42 0.58 0.18 0.33 0.32 0.01 0.15 43.79 47.37
37.0 39.45 -2.45 0.13 -1.34 -1.39 0.13 -0.54 33.04 36.07
35.0 33.67 1.33 0.11 0.72 0.71 0.03 0.24 34.96 37.70
25.0 27.62 -2.63 0.11 -1.42 -1.49 0.12 -0.52 21.00 23.75
20.0 20.94 -0.94 0.12 -0.51 -0.50 0.02 -0.18 17.61 20.51
16.0 17.18 -1.18 0.12 -0.65 -0.63 0.03 -0.23 13.35 16.29
18.0 15.26 2.74 0.12 1.50 1.58 0.15 0.58 19.29 22.20
13.0 13.02 -0.02 0.11 -0.01 -0.01 0.00 0.00 11.56 14.40
8.0 9.87 -1.87 0.10 -1.01 -1.01 0.06 -0.33 4.81 7.45
11.0 7.48 3.52 0.08 1.88 2.12 0.15 0.62 13.33 15.70
8.0 7.19 0.81 0.08 0.43 0.42 0.01 0.12 7.64 9.97
4.0 5.45 -1.45 0.06 -0.77 -0.75 0.02 -0.19 1.53 3.57
6.0 4.47 1.53 0.05 0.80 0.79 0.02 0.18 6.61 8.45