RGIVN
Fits a multivariate linear regression model via fast Givens transformations.
Required Arguments
X — ∣NROW∣ by NCOL matrix containing the data. (Input)
IIND — Independent variable option. (Input)
IIND |
Meaning |
< 0 |
The first ‑IIND columns of X contain the independent (explanatory) variables. |
> 0 |
The IIND independent variables are specified by the column numbers in INDIND. |
= 0 |
There are no independent variables. |
The regressors are the intercept (if INTCEP = 1) and the independent variables. There are INTCEP + ∣IIND∣ regression coefficients for each dependent variable.
INDIND — Index vector of length IIND containing the column numbers of X that are the independent variables. (Input, if IIND is positive)
If IIND is nonpositive, INDIND is not referenced and can be a vector of length one.
IDEP — Dependent variable option. (Input)
IDEP |
Meaning |
< 0 |
The last ‑IDEP columns of X contain the dependent (response) variables. That is, columns NCOL + IDEP + 1, NCOL + IDEP + 2, …, NCOL contain the dependent variables. |
> 0 |
The IDEP dependent (response) variables are specified by the column numbers in INDDEP.. |
= 0 |
There are no dependent variables. (Generally, this option is not used. The R matrix from a QR decomposition of a matrix of regressors is computed.) |
INDDEP — Index vector of length IDEP containing the column numbers of X that are the dependent variables. (Input, if IDEP is positive)
If IDEP is nonpositive, INDDEP is not referenced and can be a vector of length one.
B — INTCEP + ∣IIND∣ by ∣IDEP∣ matrix containing a least-squares solution for the regression coefficients on return from the final invocation of this routine. (Output, if IDO = 0 or 1; Input/Output, if IDO = 2 or 3)
If INTCEP = 1, row 1 is for the intercept. Row INTCEP + I is for the I-th independent variable. Column j is for the j-th dependent variable.
IDO |
Action |
1 or 2 |
A current least-squares solution is given by a solution x to the equation Rx = B. |
0 or 3 |
A least-squares solution for the regression coefficients is returned in B. Elements of the appropriate row(s) of B are set to 0.0 if linear dependence of the regressors is declared. |
If IDEP = 0, B is not referenced and can be a vector of length 1.
Optional Arguments
IDO — Processing option. (Input)
Default: IDO = 0.
IDO |
Action |
0 |
This is the only invocation of RGIVN for this data set, and all the data are input at once. |
1 |
This is the first invocation, and additional calls to RGIVN will be made. Initialization and updating for the data in X are performed. |
2 |
This is an intermediate invocation of RGIVN, and updating for the data in X is performed. |
3 |
This is the final invocation of this routine. Updating for the data in X and wrap-up computations are performed. |
NROW — The absolute value of NROW is the number of rows of data currently input in X. (Input)
NROW may be positive, zero, or negative. Negative NROW means that the ‑NROW rows of data are to be deleted from some aspects of the analysis, and this should be done only if IDO is 2 or 3 and the wrap-up computations have not been performed. When a negative value is input for NROW, it is assumed that each of the ‑NROW rows of X has been input (with positive NROW) in previous invocations of RGIVN. Use of negative values of NROW should be made with care and with the understanding that XMIN and XMAX cannot be updated properly in this case. It is also possible that a constant variable in the remaining data will not be recognized as such.
Default: NROW = size (X,1).
NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
INTCEP — Intercept option. (Input)
Default: INTCEP = 1.
INTCEP |
Action |
0 |
An intercept is not in the model. |
1 |
An intercept is in the model. |
IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies. If X(I, IFRQ) = 0.0, none of the remaining elements of row I of X are referenced, and updating of statistics is skipped for row I.
Default: IFRQ = 0.
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.
ICEN — Data centering option. (Input)
If INTCEP = 0, ICEN must equal 0.
Default: ICEN = 1.
ICEN |
Action |
0 |
No centering. This option should be used when (1) the data are already centered; (2) there is no intercept in the model; or (3) the independent variables for a large percentage of the data are zero, and sparsity of the problem needs to be preserved in order that the Givens rotations are performed quickly. |
1 |
Variables are centered using the method of provisional means for improved accuracy of the computations. The final estimate for the intercept and the R matrix are given for the uncentered data. This option is generally recommended. |
TOL — Tolerance used in determining linear dependence. (Input)
For RGIVN, TOL = 100 * AMACH(4) is a common choice. See the documentation for routine AMACH in Reference Material.
Default: TOL = 1.e-5 for single precision and 2.D-14 for double precision.
LDB — Leading dimension of B exactly as specified in the dimension statement in the calling program. (Input)
Default: LDB = size (B,1).
R — INTCEP + ∣IIND∣ by INTCEP + ∣IIND∣ upper triangular matrix containing the R matrix from a QR decomposition of the matrix of regressors on return from the final invocation of this routine. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
IDO |
Action |
1 or 2 |
The current matrix of raw sums of squares and crossproducts for the regressors can be found as RT · diag(D) · R where diag(D) is the diagonal matrix whose diagonal elements are the elements of the vector D. |
0 or 3 |
The matrix of raw sums of squares and crossproducts for the regressors can be found as RT R. Elements of the appropriate row(s) of R are set to 0.0 if linear dependence of the regressors is declared. |
LDR — Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input)
Default: LDR = size (R,1).
D — Vector of length INTCEP + ∣IIND∣ containing scale factors for fast Givens transformations. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
IDO |
Action |
1 or 2 |
D contains the current scale factors associated with the fast Givens transformations. |
0 or 3 |
Each element of D is set to 1.0. |
IRANK — Rank of R. (Output, if IDO = 0 or 3)
IRANK less than INTCEP + ∣IIND∣ indicates linear dependence of the regressors was declared.
DFE — Degrees of freedom for error on return from the final invocation of this routine. (Output, if IDO = 0 or 1; Input/Output, if IDO = 2 or 3)
Prior to the final invocation of RGIVN, DFE is the sum of the frequencies.
SCPE — ∣IDEP∣ by ∣IDEP∣ matrix containing error (residual) sums of squares and crossproducts. (Output, if IDO = 0 or 1; Input/Output, if IDO = 2 or 3)
SCPE(m, n) contains the current sum of crossproducts of residuals for the m-th and n-th dependent variables. If IDEP = 0, SCPE is not referenced and can be a 1 by 1 array.
LDSCPE — Leading dimension of SCPE exactly as specified in the dimension statement in the calling program. (Input)
Default: LDSCPE = size (SCPE,1).
NRMISS — Number of rows of data encountered in calls to RGIVN that contain any missing values for the independent, dependent, weight, or frequency variables. (Output, if IDO = 0 or 1; Input/Output, if IDO = 2 or 3)
NaN (not a number) is used as the missing value code. Any row of X containing NaN as a value of the independent, dependent, weight, or frequency variables is omitted from the analysis.
XMIN — Vector of length INTCEP + ∣IIND∣ containing the minimum values for each of the regressors. (Output, if IDO = 0 or 1; Input/Output, if IDO = 2 or 3)
XMAX — Vector of length INTCEP + ∣IIND∣ containing the maximum values for each of the regressors. (Output, if IDO = 0 or 1; Input/Output, if IDO = 2 or 3)
FORTRAN 90 Interface
Generic: CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B [, …])
Specific: The specific interface names are S_RGIVN and D_RGIVN.
FORTRAN 77 Interface
Single: CALL RGIVN (IDO, NROW, NCOL, X, LDX, INTCEP, IIND, INDIND, IDEP, INDDEP, IFRQ, IWT, ICEN, TOL, B, LDB, R, LDR, D, IRANK, DFE, SCPE, LDSCPE, NRMISS, XMIN, XMAX)
Double: The double precision name is DRGIVN.
Description
Routine RGIVN fits a multivariate linear regression model. (See the chapter introduction for a description of the multivariate linear regression model.) The routine RGIVN is designed so that multiple invocations can be made. In this case, zero, one, or several rows of the data set can be input for each invocation of RGIVN (with IDO = 1, 2, 2, …, 2, 3). Alternatively, one invocation of RGIVN (with IDO = 0) can be made with the entire data set contained in X. Routine RSTAT can be invoked after the wrap-up computations are performed by RGIVN to compute and print summary statistics related to the fitted regression.
Routine RGIVN performs an orthogonal reduction of the matrix of regressors to upper triangular form. The reduction is based on fast Givens transformations. (See routines SROTMG and SROTM, Golub and Van Loan 1983, pages 156-162, Gentleman 1974.) This method has two main advantages: (1) the loss of accuracy resulting from forming the crossproduct matrix used in the normal equations is avoided, (2) data can be conveniently added or deleted to take advantage of the previous computations performed.
With ICEN = 1, the current means of the independent and dependent variables are used to center the data for improved accuracy. Let xi be a column vector containing the i-th row of data for the independent variables. Let represent the mean vector for the independent variables given the data for observations 1, 2, …, i. The mean vector is defined to be
where the wj’s and fj’s are the weights and frequencies, respectively. The i-th row of data has subtracted from it, and then wifi is multiplied by the factor ai/ai-1 where
Although a crossproduct matrix is not computed, the validity of this centering operation can be seen from the following formula for the sum of squares and crossproducts matrix:
An orthogonal reduction on the centered matrix is computed. When wrap-up computations
(IDO = 3 or IDO = 0) are performed, the first rows of R and B are updated so that they reflect the statistics for the original (uncentered) data. This means that the estimate of the intercept and the R matrix are for the uncentered data.
If the i-th regressor is a linear combination of the first i ‑ 1 regressors, the i-th diagonal element of R will be close to zero (exactly zero if infinite precision arithmetic could be used) prior to the wrap-up computations. When performing the wrap-up computations, RGIVN checks sequentially for linear dependent regressors. Linear dependence of the regressors is declared if any of the following three conditions is satisfied:
-
A regressor equals zero, as determined from XMIN and XMAX.
-
Two or more regressors are constant, as determined from XMIN and XMAX.
-
is less than or equal to TOL. Here, Ri⋅1,2,…,i−1 is the multiple correlation coefficient of the i-th independent variable with the first i ‑ 1 independent variables. If no intercept is in the model (INTCEP = 0) the “multiple correlation” coefficient is computed without adjusting for the mean.
When a dependence is declared, R is changed in the wrap-up computations so as to reflect the deletion of the i-th regressor from the model. On completion of the wrap-up computations, if the i-th regressor is declared to be dependent upon the previous i ‑ 1 regressors, then the R and matrices will have all elements in their i-th rows set to zero.
Comments
1. Workspace may be explicitly provided, if desired, by use of R2IVN/DR2IVN. The reference is:
CALL R2IVN (IDO, NROW, NCOL, X, LDX, INTCEP, IIND, INDIND, IDEP, INDDEP, IFRQ, IWT, ICEN, TOL, B, LDB, R, LDR, D, IRANK, DFE, SCPE, LDSCPE, NRMISS, XMIN, XMAX, WK)
The additional argument is:
WK — Work vector of length INTCEP + ∣IIND∣ + ∣IDEP∣
2. Informational errors
Type |
Code |
Description |
4 |
1 |
Negative weight encountered. |
4 |
2 |
Negative frequency encountered. |
Examples
Example 1
The first example uses a data set from Draper and Smith (1981, pages 629-630). This data set is put into the matrix X by routine GDATA in Chapter 19, “Utilities”. There is 1 dependent variable and 4 independent variables. RGIVN is invoked to fit the regression model with the IDO = 0 option, so all computations are performed in one call.
USE RGIVN_INT
USE GDATA_INT
USE WRRRN_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER LDB, LDCOEF, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, NRX
PARAMETER (LDSCPE=1, NCOEF=5, NCOL=5, NDEP=1, NRX=13, &
LDB=NCOEF, LDCOEF=NCOEF, LDR=NCOEF, LDX=NRX)
!
INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), &
IRANK, NOBS, NOUT, NRMISS, NVAR
REAL B(LDB,NDEP), D(NCOEF), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), X(LDX,NCOL), XMAX(NCOEF), &
XMIN(NCOEF)
!
CALL GDATA (5, X, NOBS, NVAR)
!
IIND = -4
IDEP = -1
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, r=r, d=d, &
irank=irank, dfe=dfe, scpe=scpe, nrmiss=nrmiss, &
xmin=xmin, xmax=xmax)
!
CALL WRRRN ('B', B)
CALL WRRRN ('R', R)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,*) 'Regressor XMIN XMAX'
DO 10 I=1, NCOEF
WRITE (NOUT,'(1X,I5,2X,2F9.1)') I, XMIN(I), XMAX(I)
10 CONTINUE
WRITE (NOUT,*) ' '
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'DFE = ', DFE, ' SCPE(1,1) = ', SCPE(1,1)
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
Output
B
1 62.41
2 1.55
3 0.51
4 0.10
5 -0.14
R
1 2 3 4 5
1 3.6 26.9 173.6 42.4 108.2
2 0.0 20.4 12.3 -18.3 -14.2
3 0.0 0.0 52.5 1.1 -54.6
4 0.0 0.0 0.0 12.5 -12.9
5 0.0 0.0 0.0 0.0 3.4
Regressor XMIN XMAX
1 1.0 1.0
2 1.0 21.0
3 26.0 71.0
4 4.0 23.0
5 6.0 60.0
IRANK = 5
DFE = 8.00000 SCPE(1,1) = 47.8637
NRMISS = 0
Example 2
The data for the second example are taken from Maindonald (1984, pages 203‑204). The data are saved in the matrix X. Here, the data are input into RGIVN a row at a time. The data set is small for clarity. However, the approach is generally useful when the data set is large and the entire data set cannot be stored in X. A multivariate regression model containing two dependent variables and three independent variables is fit.
USE RGIVN_INT
USE WRRRN_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER INTCEP, LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS, J
PARAMETER (INTCEP=1, NCOL=5, NDEP=2, NIND=3, NOBS=9, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF, &
LDR=NCOEF)
!
INTEGER I, IDEP, IDO, IIND, INDDEP(1), INDIND(1), IRANK, &
NOUT, NRMISS, NROW
REAL B(LDB,NDEP), D(NCOEF), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), TOL, X(LDX,NCOL), XMAX(NCOEF), &
XMIN(NCOEF)
!
DATA (X(1,J),J=1,NCOL)/7.0, 5.0, 6.0, 7.0, 1.0/
DATA (X(2,J),J=1,NCOL)/2.0, -1.0, 6.0, -5.0, 4.0/
DATA (X(3,J),J=1,NCOL)/7.0, 3.0, 5.0, 6.0, 10.0/
DATA (X(4,J),J=1,NCOL)/-3.0, 1.0, 4.0, 5.0, 5.0/
DATA (X(5,J),J=1,NCOL)/2.0, -1.0, 0.0, 5.0, -2.0/
DATA (X(6,J),J=1,NCOL)/2.0, 1.0, 7.0, -2.0, 4.0/
DATA (X(7,J),J=1,NCOL)/-3.0, -1.0, 3.0, 0.0, -6.0/
DATA (X(8,J),J=1,NCOL)/2.0, 1.0, 1.0, 8.0, 2.0/
DATA (X(9,J),J=1,NCOL)/2.0, 1.0, 4.0, 3.0, 0.0/
!
NROW = 1
IIND = -NIND
IDEP = -NDEP
DO 10 I=1, 9
IF (I .EQ. 1) THEN
IDO = 1
ELSE IF (I .EQ. 9) THEN
IDO = 3
ELSE
IDO = 2
END IF
CALL RGIVN (X(I:I, 1:NCOL), IIND, INDIND, IDEP, INDDEP, &
B, IDO=IDO, R=R, D=D,IRANK=IRANK, DFE=DFE,&
SCPE=SCPE,NRMISS=NRMISS, xmin=xmin, xmax=xmax)
10 CONTINUE
!
CALL WRRRN ('B', B)
CALL WRRRN ('R', R)
CALL WRRRN ('SCPE', SCPE)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,*) 'Regressor XMIN XMAX'
DO 20 I=1, NCOEF
WRITE (NOUT,'(1X,I5,2X,2F9.1)') I, XMIN(I), XMAX(I)
20 CONTINUE
WRITE (NOUT,*)
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'DFE = ', DFE
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
Output
B
1 2
1 7.733 -1.633
2 -0.200 0.400
3 2.333 0.167
4 -1.667 0.667
R
1 2 3 4
1 3.00 6.00 3.00 12.00
2 0.00 10.00 4.00 2.00
3 0.00 0.00 4.00 2.00
4 0.00 0.00 0.00 6.00
SCPE
1 2
1 4.0 20.0
2 20.0 110.0
Regressor XMIN XMAX
1 1.0 1.0
2 -3.0 7.0
3 -1.0 5.0
4 0.0 7.0
IRANK = 4
DFE = 5.00000
NRMISS = 0
Example 3
The data for the third example are taken from Maindonald (1984, pages 104‑106). The constant regressor and the independent variables X1, X2, and X3 are linearly dependent
USE RGIVN_INT
USE WRRRN_INT
USE UMACH_INT
INTEGER INTCEP, LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS
PARAMETER (INTCEP=1, NCOL=5, NDEP=1, NIND=4, NOBS=9, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF,&
LDR=NCOEF)
!
INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), &
IRANK, NOUT, NRMISS, NROW
REAL B(LDB,NDEP), D(NCOEF), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), TOL, X(LDX,NCOL), XMAX(NCOEF), &
XMIN(NCOEF)
!
DATA (X(1,J),J=1,NCOL)/-1.0, 0.0, -0.5, 1.0, 0.0/
DATA (X(2,J),J=1,NCOL)/3.0, 0.0, 3.5, 1.0, 0.0/
DATA (X(3,J),J=1,NCOL)/2.0, -2.0, 3.5, -2.0, -2.0/
DATA (X(4,J),J=1,NCOL)/-2.0, -1.0, -1.0, 1.0, 1.0/
DATA (X(5,J),J=1,NCOL)/-1.0, 1.0, -1.0, -1.0, -1.0/
DATA (X(6,J),J=1,NCOL)/3.0, 3.0, 2.0, 1.0, 3.0/
DATA (X(7,J),J=1,NCOL)/2.0, 2.0, 1.5, 2.0, 4.0/
DATA (X(8,J),J=1,NCOL)/-2.0, -1.0, -1.0, -1.0, -2.0/
DATA (X(9,J),J=1,NCOL)/2.0, 1.0, 2.0, 1.0, 3.0/
!
NROW = NOBS
IIND = -NIND
IDEP = -NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, r=r, d=d, &
irank=irank, dfe=dfe, scpe=scpe, nrmiss=nrmiss, &
xmin=xmin, xmax=xmax)
!
CALL WRRRN ('B', B)
CALL WRRRN ('R', R)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,*) 'Regressor Minimum Maximum'
DO 10 I=1, NCOEF
WRITE (NOUT,'(1X,I5,2X,2F9.1)') I, XMIN(I), XMAX(I)
10 CONTINUE
WRITE (NOUT,*)
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'DFE = ', DFE, ' SCPE(1,1) = ', SCPE(1,1)
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
Output
B
1 0.056
2 0.167
3 0.500
4 0.000
5 1.000
R
1 2 3 4 5
1 3.000 2.000 1.000 3.000 1.000
2 0.000 6.000 2.000 5.000 1.000
3 0.000 0.000 4.000 -2.000 2.000
4 0.000 0.000 0.000 0.000 0.000
5 0.000 0.000 0.000 0.000 3.000
Regressor Minimum Maximum
1 1.0 1.0
2 -2.0 3.0
3 -2.0 3.0
4 -1.0 3.5
5 -2.0 2.0
IRANK = 4
DFE = 5.00000 SCPE(1,1) = 6.00000
NMISS = 0