RLAV

FNLStat : Regression : RLAV

RLAV

Required Arguments

Fits a multiple linear regression model using the least absolute values criterion.

Required Arguments

X — NOBS by NCOL matrix containing the data. (Input)

IIND — Independent variable option. (Input)
The absolute value of IIND is the number of independent (explanatory) variables. The sign of IIND specifies the following options:

IIND	Meaning
< 0	The data for the ‑IIND independent variables are given in the first ‑IIND columns of X.
> 0	The data for the IIND independent variables are in the columns of X whose column numbers are given by the elements of INDIND.
= 0	There are no independent variables.

The regressors are the constant regressor (if INTCEP = 1) and the independent variables.

INDIND — Index vector of length IIND containing the column numbers of X that are the independent (explanatory) variables. (Input, if IIND is positive)
If IIND is negative, INDIND is not referenced and can be a vector of length one.

IRSP — Column number IRSP of X contains the data for the response (dependent) variable. (Input)

B — Vector of length INTCEP + ∣IIND∣ containing a LAV solution for the regression coefficients. (Output)
If INTCEP = 1, B(1) contains the intercept estimate. B(INTCEP + I) contains the coefficient estimate for the I-th independent variable.

Optional Arguments

NOBS — Number of observations. (Input)
Default: NOBS = size (X,1).

NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

INTCEP — Intercept option. (Input)
Default: INTCEP = 1.

INTCEP	Action
0	An intercept is not in the model.
1	An intercept is in the model.

IRANK — Rank of the matrix of regressors. (Output)
If IRANK is less than INTCEP + ∣IIND∣, linear dependence of the regressors was declared.

SAE — Sum of the absolute values of the errors. (Output)

ITER — Number of iterations performed. (Output)

NRMISS — Number of rows of data containing NaN (not a number) for the dependent or independent variables. (Output)
If a row of data contains NaN for any of these variables, that row is excluded from the computations.

FORTRAN 90 Interface

Generic: CALL RLAV (X, IIND, INDIND, IRSP, B [, …])

Specific: The specific interface names are S_RLAV and D_RLAV.

FORTRAN 77 Interface

Single: CALL RLAV (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND, IRSP, B, IRANK, SAE, ITER, NRMISS)

Double: The double precision name is DRLAV.

Description

Routine RLAV computes estimates of the regression coefficients in a multiple linear regression model. The criterion satisfied is the minimization of the sum of the absolute values of the deviations of the observed response yi from the fitted response

for a set on n observations. Under this criterion, known as the L1 or LAV (least absolute value) criterion, the regression coefficient estimates minimize

The estimation problem can be posed as a linear programming problem. The special nature of the problem, however, allows for considerable gains in efficiency by the modification of the usual simplex algorithm for linear programming. These modifications are described in detail by Barrodale and Roberts (1973, 1974).

In many cases, the algorithm can be made faster by computing a least-squares solution prior to the invocation of RLAV. This is particularly useful when a least-squares solution has already been computed. The procedure is as follows:

1. Fit the model using least squares and compute the residuals from this fit.

2. Fit the residuals from Step 1 on the regressor variables in the model using RLAV.

3 Add the two estimated regression coefficient vectors from Steps 1 and 2. The result is an L1 solution.

When multiple solutions exist for a given problem, routine RLAV may yield different estimates of the regression coefficients on different computers, however, the sum of the absolute values of the residuals should be the same (within rounding differences). The informational error indicating nonunique solutions may result from rounding accumulation. Conversely, because of rounding the error may fail to result even when the problem does have multiple solutions.

Comments

1. Workspace may be explicitly provided, if desired, by use of R2AV/DR2AV. The reference is:

CALL R2AV (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND,
IRSP, B, IRANK, SAE, ITER, NRMISS, IWK, WK)

The additional arguments are as follows:

IWK — Work vector of length NOBS

WK — Work vector of length NOBS * (∣IIND∣ + 5) + 2 * ∣IIND∣ + 4

2. Informational error

Type	Code	Description
3	1	The solution may not be unique.
4	1	Calculations terminated prematurely due to rounding. This occurs only when rounding errors cause a pivot to be encountered whose magnitude is less than AMACH(4) and is indicative of a large ill-conditioned problem.

Example

A straight line fit to a data set is computed under the LAV criterion.

USE RLAV_INT

USE UMACH_INT

USE WRRRL_INT

IMPLICIT NONE

INTEGER LDX, NCOEF, NCOL, NOBS, J

PARAMETER (NCOEF=2, NCOL=2, NOBS=8, LDX=NOBS)

INTEGER IIND, INDIND(1), IRANK, IRSP, ITER, NOUT, &

NRMISS

REAL B(NCOEF), SAE, X(LDX,NCOL)

CHARACTER CLABEL(1)*4, RLABEL(1)*4

DATA (X(1,J),J=1,NCOL) /1.0, 1.0/

DATA (X(2,J),J=1,NCOL) /4.0, 5.0/

DATA (X(3,J),J=1,NCOL) /2.0, 0.0/

DATA (X(4,J),J=1,NCOL) /2.0, 2.0/

DATA (X(5,J),J=1,NCOL) /3.0, 1.5/

DATA (X(6,J),J=1,NCOL) /3.0, 2.5/

DATA (X(7,J),J=1,NCOL) /4.0, 2.0/

DATA (X(8,J),J=1,NCOL) /5.0, 3.0/

IIND = -1

IRSP = 2

CALL RLAV (X, IIND, INDIND, IRSP, B, irank=irank, sae=sae, &

iter=iter, nrmiss=nrmiss)

CALL UMACH (2, NOUT)

RLABEL(1) = 'B ='

CLABEL(1) = 'NONE'

CALL WRRRL (' ', B, RLABEL, CLABEL, 1, NCOEF, 1, FMT='(F6.2)')

WRITE (NOUT,*) 'IRANK = ', IRANK

WRITE (NOUT,*) 'SAE = ', SAE

WRITE (NOUT,*) 'ITER = ', ITER

WRITE (NOUT,*) 'NRMISS = ', NRMISS

END

Output

B = 0.50 0.50

IRANK = 2

SAE = 6.00000

ITER = 2

NRMISS = 0

Figure 11, Least Squares and Least Absolute Value Fitted Lines