ROTIN

Computes diagnostics for detection of outliers and influential data points given residuals and the R matrix for a fitted general linear model.

Required Arguments

XNRX by NCOL matrix containing the data. (Input)

IIND — Independent variable option. (Input)
The absolute value of IIND is the number of independent (explanatory) variables. The sign of IIND specifies the following options:

 

IIND

Meaning

< 0

The data for the ‑IIND independent variables are given in the first ‑IIND columns of X.

> 0

The data for the IIND independent variables are in the columns of X whose column numbers are given by the elements of INDIND.

= 0

There are no independent variables.

The regressors are the constant regressor (if INTCEP = 1) and the independent variables.

INDIND — Index vector of length IIND containing the column numbers of X that are the independent (explanatory) variables. (Input, if IIND is positive)
If IIND is nonpositive, INDIND is not referenced and can be a vector of length one.

RINTCEP + IIND by INTCEP + IIND upper triangular matrix containing the R matrix. (Input)
The R matrix can come from a regression fit based on a QR decomposition of the matrix of regressors or based on a Cholesky factorization RTR of the matrix of sums of squares and crossproducts of the regressors. Elements to the right of a diagonal element of R that is zero must also be zero. A zero row indicates a nonfull rank model. For an R matrix that comes from a regression fit with linear equality restrictions on the parameters, each row of R corresponding to a restriction must have a corresponding diagonal element that is negative. The remaining rows of R must have positive diagonal elements.

DFE — Degrees of freedom for error. (Input)

SSE — Sum of squares for error. (Input)

E — Vector of length NRX with the residuals. (Input)
If a residual is not known, e.g., the value for the dependent (response) variable was missing, the input value of the corresponding element of E should equal NaN (not a number).

OTINNRX by 6 matrix containing diagnostics for detection of outliers and influential cases. (Output)
The columns of OTIN contain the following:

 

Col.

Description

1

Residual

2

Leverage (diagonal element of the ‘Hat’ matrix)

3

Standardized residual

4

Jackknife (deleted) residual

5

Cook’s Distance

6

DFFITS

Optional Arguments

NRX — Number of rows of data. (Input)
Default: NRX = size (X,1).

NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

INTCEP — Intercept option. (Input)
Default: INTCEP = 1.

 

INTCEP

Action

0

An intercept is not in the model.

1

An intercept is in the model.

IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.

LDR — Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input)
Default: LDR = size (R,1).

LDOTIN — Leading dimension of OTIN exactly as specified in the dimension statement in the calling program. (Input)
Default: LDOTIN = size (OTIN,1).

NRMISS — Number of rows of OTIN containing NaN (not a number). (Output)
If any row of data contains NaN as a value of the independent variable or weight, elements in columns 2 thru 6 of the corresponding row in OTIN are set to NaN. If the residual is missing, elements in columns 3 thru 6 are set to NaN.

FORTRAN 90 Interface

Generic: CALL ROTIN (X, IIND, INDIND, R, DFE, SSE, E, OTIN [])

Specific: The specific interface names are S_ROTIN and D_ROTIN.

FORTRAN 77 Interface

Single: CALL ROTIN (NRX, NCOL, X, LDX, INTCEP, IIND, INDIND, IWT, R, LDR, DFE, SSE, E, OTIN, LDOTIN, NRMISS)

Double: The double precision name is DROTIN.

Description

The multiple regression model used by routine ROTIN is

y = X β + ɛ

where y is the n × 1 vector of responses, X is the n × p matrix of regressors, β is the p × 1 vector of regression coefficients, and ɛ is the n × 1 vector of errors whose elements are independently normally distributed with mean 0 and variance σ2/wi. The model used by ROTIN also permits linear equality restrictions on β. From a multiple regression model fit using the wi’s as the weights, routine ROTIN computes diagnostics for outliers and influential cases. Let xi be a column vector containing elements of the i-th row of X. Let W = diag(w1w2wn). The leverage is defined as

 

(In the case of linear equality restrictions on β, the leverage is defined in terms of the reduced model.) Put D = diag(d1d2dp) with dj = 1 if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as hi = (aTDa)wi where a is a solution to RTa = xi. The computation of the remainder of the case diagnostics follows easily from their definitions. See the Diagnostics for Individual Cases section in the chapter introduction for definitions of the case diagnostics.

The type 3 informational errors can occur if the input variables X, R, E and SSE are not consistent with each other or if excessive rounding has occurred in their computation.The type 3 error message with error code 2 arises when X contains a row not in the space spanned by the rows of R. An examination of the model that was fitted and the X for which diagnostics are to be computed is required in order to insure that only linear combinations of the regression coefficients that can be estimated from the fitted model are specified in X. For further details, see the discussion of estimable functions given by Maindonald (1984, pages 166168) and Searle (1971, pages 180188).

Comments

1. Workspace may be explicitly provided, if desired, by use of R2TIN/DR2TIN. The reference is:

CALL R2TIN (NRX, NCOL, X, LDX, INTCEP, IIND, INDIND, IWT, R, LDR, DFE, SSE, E, OTIN, LDOTIN, NRMISS, WK)

The additional argument is:

WK — Work vector of length INTCEP + IIND.

2. Informational errors

 

Type

Code

Description

3

2

The linear combination of the regression coefficients specified is not estimable within the preset tolerance.

3

3

A leverage much greater than 1.0 was computed. It is set to 1.0.

3

4

A deleted residual mean square much less than 0.0 was computed. It is set to 0.0.

4

1

A weight is negative. Weights must be nonnegative.

Examples

Example 1

A multiple linear regression model is fit and case statistics computed for data discussed by Cook and Weisberg (1982, page 103). The fitted model is

 

Some of the statistics in row 6 of the output matrix OTIN are undefined (0.0/0.0) and are set to NaN (not a number). Some statistics in row 4 of OTIN are infinite and are set to machine infinity. The values of NaN and machine infinity can be retrieved by routine AMACH (or DMACH when using double precision), which is documented in Reference Material.

 

USE IMSL_LIBRARIES

 

IMPLICIT NONE

INTEGER INTCEP, LDB, LDOTIN, LDR, LDSCPE, LDX, NCOEF, NCOL, &

NDEP, NIND, NROW, J

PARAMETER (INTCEP=1, NCOL=3, NDEP=1, NIND=2, NROW=7, &

LDOTIN=NROW, LDSCPE=NDEP, LDX=NROW, &

NCOEF=INTCEP+NIND, LDB=NCOEF, LDR=NCOEF)

!

INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), NOUT, NRMISS

REAL B(LDB,NDEP), D(NCOEF), DFE, E(NROW), &

OTIN(LDOTIN,6), R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &

SSE, X(LDX,NCOL), XMAX(NCOEF), XMIN(NCOEF)

CHARACTER CLABEL(7)*10, RLABEL(1)*6

!

DATA CLABEL/'Obs.', 'Residual', 'Leverage', 'Std. Res.', &

'Jack. Res.', 'Cook''s D', 'DFFITS'/

DATA RLABEL/'NUMBER'/

!

DATA (X(1,J),J=1,NIND+NDEP) /1.0, 1.0, 3.0/

DATA (X(2,J),J=1,NIND+NDEP) /1.0, 2.0, 4.0/

DATA (X(3,J),J=1,NIND+NDEP) /1.0, 3.0, 5.0/

DATA (X(4,J),J=1,NIND+NDEP) /1.0, 4.0, 7.0/

DATA (X(5,J),J=1,NIND+NDEP) /1.0, 5.0, 7.0/

DATA (X(6,J),J=1,NIND+NDEP) /0.0, 6.0, 8.0/

DATA (X(7,J),J=1,NIND+NDEP) /1.0, 7.0, 9.0/

!

IIND = -NIND

IDEP = -NDEP

CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)

SSE = SCPE(1,1)

! Compute residuals.

DO 10 I=1, NROW

E(I) = X(I,NCOL) - B(1,1) - SDOT(NIND, B((INTCEP+1): ,1), &

1, X(I:, 1), LDX)

10 CONTINUE

!

CALL ROTIN (X, IIND, INDIND, R, DFE, SSE, E, OTIN, &

NRMISS=NRMISS)

!

CALL WRRRL ('OTIN', OTIN, RLABEL, CLABEL, FMT='(F10.3)')

CALL UMACH (2, NOUT)

WRITE (NOUT,*) 'NRMISS = ', NRMISS

!

END

Output

 

OTIN

Obs. Residual Leverage Std. Res. Jack. Res. Cook’s D DFFITS

1 -0.129 0.471 -0.389 -0.343 0.045 -0.324

2 -0.143 0.286 -0.371 -0.327 0.018 -0.207

3 -0.157 0.186 -0.383 -0.338 0.011 -0.161

4 0.829 0.171 2.000 Inf 0.276 Inf

5 -0.186 0.243 -0.469 -0.418 0.024 -0.237

6 0.000 1.000 NaN NaN NaN NaN

7 -0.214 0.643 -0.788 -0.742 0.372 -0.996

NRMISS = 1

Example 2

In this example, routine RNLIN is first invoked to fit the following nonlinear regression model discussed by Neter, Wasserman, and Kutner (1983, pages 475478):

 

Then, ROTIN is used to compute case diagnostics. In addition, the leverage output by ROTIN is used to construct asymptotic confidence intervals on the mean of the nonlinear regression function evaluated at xi. The asymptotic 95% confidence intervals are computed using the formula:

 

where hi is the computed leverage, t.975,DFE is the 97.5 percentile of the t distribution with DFE degrees of freedom as computed by routine TIN (see Chapter 17, “Probability Distribution Functions and Inverses”), and s2 equals SSE/DFE.

 

USE IMSL_LIBRARIES

 

IMPLICIT NONE

INTEGER LDOTIN, LDR, NOBS, NPARM, NRX

PARAMETER (NOBS=15, NPARM=2, NRX=1, LDOTIN=NRX, LDR=NPARM)

!

INTEGER IDERIV, IDUMMY(1), IEND, IOBS, IRANK, J, NOUT, NRMISS

REAL A, DE(NPARM, 1), DFE, E(1), FRQ, OTIN(LDOTIN,6), &

R(LDR,NPARM), SQRT, SSE, THETA(NPARM), WT, Y, &

YHAT

INTRINSIC SQRT

EXTERNAL EXAMPL

!

DATA THETA/60.0, -0.03/

!

CALL UMACH (2, NOUT)

!

IDERIV = 1

CALL RNLIN (EXAMPL, THETA, R=R, DFE=DFE, SSE=SSE)

!

WRITE (NOUT,*) ' Obs. Pred. Res. Lev. St Res Del Res Cook '// &

'D DFFIT Conf Interval'

DO 10 IOBS=1, NOBS

CALL EXAMPL (NPARM, THETA, 0, IOBS, FRQ, WT, E, DE, IEND)

CALL EXAMPL (NPARM, THETA, 1, IOBS, FRQ, WT, E, DE, IEND)

CALL EXAMPL (NPARM, THETA, 2, IOBS, FRQ, WT, Y, DE, IEND)

YHAT = Y - E(1)

CALL ROTIN (DE, -NPARM, IDUMMY, R, DFE, SSE, E, OTIN, &

NRX=NRX, LDX=1, INTCEP=0)

A = TIN(0.975,DFE)*SQRT((SSE/DFE)*OTIN(1,2))

WRITE (NOUT,'(F5.1,10F7.2)') Y, YHAT, (OTIN(1,J),J=1,6), &

YHAT - A, YHAT + A

10 CONTINUE

END

!

SUBROUTINE EXAMPL (NPARM, THETA, IOPT, IOBS, FRQ, WT, E, DE, &

IEND)

INTEGER NPARM, IOPT, IOBS, IEND

REAL THETA(NPARM), FRQ, WT, E(1), DE(NPARM, 1)

!

INTEGER NOBS

PARAMETER (NOBS=15)

!

REAL EXP, XDATA(NOBS), YDATA(NOBS)

INTRINSIC EXP

!

DATA YDATA/54.0, 50.0, 45.0, 37.0, 35.0, 25.0, 20.0, 16.0, 18.0, &

13.0, 8.0, 11.0, 8.0, 4.0, 6.0/

DATA XDATA/2.0, 5.0, 7.0, 10.0, 14.0, 19.0, 26.0, 31.0, 34.0, &

38.0, 45.0, 52.0, 53.0, 60.0, 65.0/

 

!

IF (IOBS .LE. NOBS) THEN

WT = 1.0E0

FRQ = 1.0E0

IEND = 0

IF (IOPT .EQ. 0) THEN

E(1) = YDATA(IOBS) - THETA(1)*EXP(THETA(2)*XDATA(IOBS))

ELSE IF (IOPT .EQ. 1) THEN

DE(1, 1) = -EXP(THETA(2)*XDATA(IOBS))

DE(2, 1) = -THETA(1)*XDATA(IOBS)*EXP(THETA(2)*XDATA(IOBS))

ELSE IF (IOPT .EQ. 2) THEN

E(1) = YDATA(IOBS)

END IF

ELSE

IEND = 1

END IF

RETURN

END

Output

 

Obs. Pred. Res. Lev. St Res Del Res Cook D DFFIT Conf Interval

54.0 54.15 -0.14 0.40 -0.09 -0.09 0.00 -0.07 51.19 56.53

50.0 48.08 1.92 0.24 1.13 1.14 0.21 0.65 49.84 54.00

45.0 44.42 0.58 0.18 0.33 0.32 0.01 0.15 43.79 47.37

37.0 39.45 -2.45 0.13 -1.34 -1.39 0.13 -0.54 33.04 36.07

35.0 33.67 1.33 0.11 0.72 0.71 0.03 0.24 34.96 37.70

25.0 27.62 -2.63 0.11 -1.42 -1.49 0.12 -0.52 21.00 23.75

20.0 20.94 -0.94 0.12 -0.51 -0.50 0.02 -0.18 17.61 20.51

16.0 17.18 -1.18 0.12 -0.65 -0.63 0.03 -0.23 13.35 16.29

18.0 15.26 2.74 0.12 1.50 1.58 0.15 0.58 19.29 22.20

13.0 13.02 -0.02 0.11 -0.01 -0.01 0.00 0.00 11.56 14.40

8.0 9.87 -1.87 0.10 -1.01 -1.01 0.06 -0.33 4.81 7.45

11.0 7.48 3.52 0.08 1.88 2.12 0.15 0.62 13.33 15.70

8.0 7.19 0.81 0.08 0.43 0.42 0.01 0.12 7.64 9.97

4.0 5.45 -1.45 0.06 -0.77 -0.75 0.02 -0.19 1.53 3.57

6.0 4.47 1.53 0.05 0.80 0.79 0.02 0.18 6.61 8.45