RLAV
Fits a multiple linear regression model using the least absolute values criterion.
Required Arguments
X — NOBS by NCOL matrix containing the data. (Input)
IIND — Independent variable option. (Input)
The absolute value of IIND is the number of independent (explanatory) variables. The sign of IIND specifies the following options:
IIND |
Meaning |
< 0 |
The data for the ‑IIND independent variables are given in the first ‑IIND columns of X. |
> 0 |
The data for the IIND independent variables are in the columns of X whose column numbers are given by the elements of INDIND. |
= 0 |
There are no independent variables. |
The regressors are the constant regressor (if INTCEP = 1) and the independent variables.
INDIND — Index vector of length IIND containing the column numbers of X that are the independent (explanatory) variables. (Input, if IIND is positive)
If IIND is negative, INDIND is not referenced and can be a vector of length one.
IRSP — Column number IRSP of X contains the data for the response (dependent) variable. (Input)
B — Vector of length INTCEP + ∣IIND∣ containing a LAV solution for the regression coefficients. (Output)
If INTCEP = 1, B(1) contains the intercept estimate. B(INTCEP + I) contains the coefficient estimate for the I-th independent variable.
Optional Arguments
NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).
NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
INTCEP — Intercept option. (Input)
Default: INTCEP = 1.
INTCEP |
Action |
0 |
An intercept is not in the model. |
1 |
An intercept is in the model. |
IRANK — Rank of the matrix of regressors. (Output)
If IRANK is less than INTCEP + ∣IIND∣, linear dependence of the regressors was declared.
SAE — Sum of the absolute values of the errors. (Output)
ITER — Number of iterations performed. (Output)
NRMISS — Number of rows of data containing NaN (not a number) for the dependent or independent variables. (Output)
If a row of data contains NaN for any of these variables, that row is excluded from the computations.
FORTRAN 90 Interface
Generic: CALL RLAV (X, IIND, INDIND, IRSP, B [, …])
Specific: The specific interface names are S_RLAV and D_RLAV.
FORTRAN 77 Interface
Single: CALL RLAV (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND, IRSP, B, IRANK, SAE, ITER, NRMISS)
Double: The double precision name is DRLAV.
Description
Routine RLAV computes estimates of the regression coefficients in a multiple linear regression model. The criterion satisfied is the minimization of the sum of the absolute values of the deviations of the observed response yi from the fitted response
for a set on n observations. Under this criterion, known as the L1 or LAV (least absolute value) criterion, the regression coefficient estimates minimize
The estimation problem can be posed as a linear programming problem. The special nature of the problem, however, allows for considerable gains in efficiency by the modification of the usual simplex algorithm for linear programming. These modifications are described in detail by Barrodale and Roberts (1973, 1974).
In many cases, the algorithm can be made faster by computing a least-squares solution prior to the invocation of RLAV. This is particularly useful when a least-squares solution has already been computed. The procedure is as follows:
1. Fit the model using least squares and compute the residuals from this fit.
2. Fit the residuals from Step 1 on the regressor variables in the model using RLAV.
3 Add the two estimated regression coefficient vectors from Steps 1 and 2. The result is an L1 solution.
When multiple solutions exist for a given problem, routine RLAV may yield different estimates of the regression coefficients on different computers, however, the sum of the absolute values of the residuals should be the same (within rounding differences). The informational error indicating nonunique solutions may result from rounding accumulation. Conversely, because of rounding the error may fail to result even when the problem does have multiple solutions.
Comments
1. Workspace may be explicitly provided, if desired, by use of R2AV/DR2AV. The reference is:
CALL R2AV (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND,
IRSP, B, IRANK, SAE, ITER, NRMISS, IWK, WK)
The additional arguments are as follows:
IWK — Work vector of length NOBS
WK — Work vector of length NOBS * (∣IIND∣ + 5) + 2 * ∣IIND∣ + 4
2. Informational error
Type |
Code |
Description |
3 |
1 |
The solution may not be unique. |
4 |
1 |
Calculations terminated prematurely due to rounding. This occurs only when rounding errors cause a pivot to be encountered whose magnitude is less than AMACH(4) and is indicative of a large ill-conditioned problem. |
Example
A straight line fit to a data set is computed under the LAV criterion.
USE RLAV_INT
USE UMACH_INT
USE WRRRL_INT
IMPLICIT NONE
INTEGER LDX, NCOEF, NCOL, NOBS, J
PARAMETER (NCOEF=2, NCOL=2, NOBS=8, LDX=NOBS)
!
INTEGER IIND, INDIND(1), IRANK, IRSP, ITER, NOUT, &
NRMISS
REAL B(NCOEF), SAE, X(LDX,NCOL)
CHARACTER CLABEL(1)*4, RLABEL(1)*4
!
DATA (X(1,J),J=1,NCOL) /1.0, 1.0/
DATA (X(2,J),J=1,NCOL) /4.0, 5.0/
DATA (X(3,J),J=1,NCOL) /2.0, 0.0/
DATA (X(4,J),J=1,NCOL) /2.0, 2.0/
DATA (X(5,J),J=1,NCOL) /3.0, 1.5/
DATA (X(6,J),J=1,NCOL) /3.0, 2.5/
DATA (X(7,J),J=1,NCOL) /4.0, 2.0/
DATA (X(8,J),J=1,NCOL) /5.0, 3.0/
!
IIND = -1
IRSP = 2
!
CALL RLAV (X, IIND, INDIND, IRSP, B, irank=irank, sae=sae, &
iter=iter, nrmiss=nrmiss)
!
CALL UMACH (2, NOUT)
RLABEL(1) = 'B ='
CLABEL(1) = 'NONE'
CALL WRRRL (' ', B, RLABEL, CLABEL, 1, NCOEF, 1, FMT='(F6.2)')
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'SAE = ', SAE
WRITE (NOUT,*) 'ITER = ', ITER
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
Output
B = 0.50 0.50
IRANK = 2
SAE = 6.00000
ITER = 2
NRMISS = 0
Figure 1, Least Squares and Least Absolute Value Fitted Lines