RLSE

Fits a multiple linear regression model using least squares.

Required Arguments

Y — Vector of length NOBS containing the dependent (response) variable. (Input)

X — NOBS by NIND matrix containing the independent (explanatory) variables. (Input)

B — Vector of length INTCEP + NIND containing a least-squares solution for the regression coefficients. (Output)

For INTCEP = 0, the fitted value for observation I is
B(1) * X(I, 1) + B(2) * X(I, 2) + … + B(NIND) * X(I, NIND).

For INTCEP = 1, the fitted value for observation I is
B(1) + B(2) * X(I, 1) + … + B(NIND + 1) * X(I, NIND).

Optional Arguments

NOBS — Number of observations. (Input)
Default: NOBS = size (Y,1).

NIND — Number of independent (explanatory) variables. (Input)
Default: NIND = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

INTCEP — Intercept option. (Input)
Default: INTCEP = 1.

INTCEP	Action
0	An intercept is not in the model.
1	An intercept is in the model.

SST — Total sum of squares. (Output)
If INTCEP = 1, the total sum of squares is corrected for the mean.

SSE — Sum of squares for error. (Output)

FORTRAN 90 Interface

Generic: CALL RLSE (Y, X, B [, …])

Specific: The specific interface names are S_RLSE and D_RLSE.

FORTRAN 77 Interface

Single: CALL RLSE (NOBS, Y, NIND, X, LDX, INTCEP, B, SST, SSE)

Double: The double precision name is DRLSE.

Description

Routine RLSE fits a multiple linear regression model with or without an intercept. If INTCEP = 1, the multiple linear regression model is

where the observed values of the yi’s (input in Y) constitute the responses or values of the dependent variable, the xi1’s, xi2’s, …, xik’s (input in X) are the settings of the k (input in NIND) independent variables, β0, β1, …, βk are the regression coefficients whose estimated values are output in B, and the ɛ i’s are independently distributed normal errors each with mean zero and variance σ2. Here, n is the number of valid rows in the augmented matrix (X, Y), i.e. n equals NOBS ‑ NRMISS (the number of rows that do not contain NaN). If INTCEP = 0, β0 is not included in the model.

Routine RLSE computes estimates of the regression coefficients by minimizing the sum of squares of the deviations of the observed response yi from the fitted response

for the n observations. This minimum sum of squares (the error sum of squares) is output and denoted by

In addition, the total sum of squares is output. For the case, INTCEP = 1, the total sum of squares is the sum of squares of the deviations of yi from its mean

— the so-called corrected total sum of squares; it is denoted by

For the case INTCEP = 0, the total sum of squares is the sum of squares of yi — the so-called uncorrected total sum of squares; it is denoted by

See Draper and Smith (1981) for a good general treatment of the multiple linear regression model, its analysis, and many examples.

In order to compute a least-squares solution, RLSE performs an orthogonal reduction of the matrix of regressors to upper triangular form. If the user needs the upper triangular matrix output for subsequent computing, the routine R2SE can be invoked in place of RLSE. (See the description of R in Comment 1). The reduction is based on one pass through the rows of the augmented matrix (X, Y) using fast Givens transformations. (See routines SROTMG and SROTM Golub and Van Loan, 1983, pages 156-162, Gentleman, 1974.) This method has the advantage that the loss of accuracy resulting from forming the crossproduct matrix used in the normal equations is avoided.

With INTCEP = 1, the current means of the dependent and independent variables are used to internally center the data for improved accuracy. Let xj be a column vector containing the j-th row of data for the independent variables. Let represent the mean vector for the independent variables given the data for rows 1, 2, …, i. The current mean vector is defined to be

The i-th row of data has subtracted from it and is then weighted by i/(i ‑ 1). Although a crossproduct matrix is not computed, the validity of this centering operation can be seen from the following formula for the sum of squares and crossproducts matrix:

An orthogonal reduction on the centered matrix is computed. When the final computations are performed, the first row of R and the first element of B are updated so that they reflect the statistics for the original (uncentered) data. This means that the estimate of the intercept and the R matrix are for the uncentered data.

As part of the final computations, RLSE checks for linearly dependent regressors. If the i-th regressor is a linear combination of the first i ‑ 1 regressors, the i-th diagonal element of R is close to zero (exactly zero if infinite precision arithmetic could be used) prior to the final computations. In particular, linear dependence of the regressors is declared if any of the following three conditions is satisfied:

A regressor equals zero.
Two or more regressors are constant.
The result of

is less than or equal to 100 × ɛ where ɛ is the machine epsilon. (For RLSE, ɛ = AMACH(4) and for DRLSE, ɛ = DMACH(4). See routines AMACH and DMACH in Reference Material).

Here, Ri⋅1, 2, …, i−1 is the multiple correlation coefficient of the i-th independent variable with the first i ‑ 1 independent variables. If no intercept is in the model (INTCEP = 0), the “multiple correlation” coefficient is computed without adjusting for the mean.

On completion of the final computations, if the i-th regressor is declared to be linearly dependent upon the previous i ‑ 1 regressors, then the i-th element of B and all elements in the i-th row of R are set to zero. Finally, if a linear dependence is declared, an informational (error) message, code 1, is issued indicating the model is not full rank.

Comments

1. Workspace may be explicitly provided, if desired, by use of R2SE/DR2SE. The reference is:

CALL R2SE (NOBS, Y, NIND, X, LDX, INTCEP, B, SST, SSE, R, LDR, DFE, NRMISS, WK)

The additional arguments are as follows:

R — INTCEP + NIND by INTCEP + NIND upper triangular matrix containing the R matrix from a QR decomposition of the matrix of regressors. (Output)
All of the diagonal element of R are taken to be nonnegative. The rank of the matrix of regressors is the number of positive diagonal elements, which equals NOBS ‑ NRMISS ‑ DFE.

LDR — Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input)

DFE — Degrees of freedom for error. (Output)

NRMISS — Number of rows in the augmented matrix (X, Y) containing NaN (not a number). (Output)
If a row contains NaN, that row is excluded from all other computations.

WK — Work vector of length 5 * NIND + 4 * INTCEP + 2.

2. Informational error

Type	Code	Description
3	1	The model is not full rank. There is not a unique least-squares solution. If the I-th diagonal element of R is zero, B(I) is set to zero in order to compute a solution.

Examples

Example 1

A regression model

yi= β0 + β1xi1 + β2xi2 + β3xi3 + ɛi i = 1,2, …, 9

is fitted to data taken from Maindonald (1984, pages 203‑204).

USE RLSE_INT

USE WRRRN_INT

USE UMACH_INT

IMPLICIT NONE

INTEGER INTCEP, LDX, NCOEF, NIND, NOBS, J

PARAMETER (INTCEP=1, NIND=3, NOBS=9, LDX=NOBS, &

NCOEF=INTCEP+NIND)

INTEGER NOUT

REAL B(NCOEF), SSE, SST, X(LDX,NIND), Y(NOBS)

DATA (X(1,J),J=1,NIND)/ 7.0, 5.0, 6.0/, Y(1)/ 7.0/

DATA (X(2,J),J=1,NIND)/ 2.0, -1.0, 6.0/, Y(2)/-5.0/

DATA (X(3,J),J=1,NIND)/ 7.0, 3.0, 5.0/, Y(3)/ 6.0/

DATA (X(4,J),J=1,NIND)/-3.0, 1.0, 4.0/, Y(4)/ 5.0/

DATA (X(5,J),J=1,NIND)/ 2.0, -1.0, 0.0/, Y(5)/ 5.0/

DATA (X(6,J),J=1,NIND)/ 2.0, 1.0, 7.0/, Y(6)/-2.0/

DATA (X(7,J),J=1,NIND)/-3.0, -1.0, 3.0/, Y(7)/ 0.0/

DATA (X(8,J),J=1,NIND)/ 2.0, 1.0, 1.0/, Y(8)/ 8.0/

DATA (X(9,J),J=1,NIND)/ 2.0, 1.0, 4.0/, Y(9)/ 3.0/

CALL RLSE (Y, X, B, SST=SST, SSE=SSE)

CALL WRRRN ('B', B)

CALL UMACH (2, NOUT)

WRITE (NOUT,*)

WRITE (NOUT,99999) 'SST = ', SST, ' SSE = ', SSE

99999 FORMAT (A7, F7.2, A7, F7.2)

END

Output

1 7.733

2 -0.200

3 2.333

4 -1.667

SST = 156.00 SSE = 4.00

Example 2

A weighted least-squares fit is computed using the model

yi= β0 + β1xi1 + β2xi2 + ɛii = 1, 2, …, 4

and weights 1/i2 discussed by Maindonald (1984, pages 67 - 68). In order to compute the weighted least-squares fit, using an ordinary least squares routine (RLSE), the regressors (including the column of ones for the intercept term as well as the independent variables) and the responses must be transformed prior to invocation of RLSE. The transformed regressors and responses can be computed by using routine SHPROD (IMSL MATH/LIBRARY). For the i-th case the corresponding response and regressors are multiplied by a square root of the i-th weight. Because the column of ones corresponding to the intercept term in the untransformed model, is transformed by the weights, this transformed column of ones must be input to the least squares subroutine as an additional independent variable along with the option INTCEP = 0.

In terms of the original, untransformed regressors and responses, the minimum sum of squares for error output in SSE is

where here the weight wi = 1/i2. Also, since INTCEP = 0, the uncorrected total sum of squares is output in SST. In terms of the original untransformed responses,

USE RLSE_INT

USE SHPROD_INT

USE WRRRN_INT

USE UMACH_INT

IMPLICIT NONE

INTEGER INTCEP, LDX, NCOEF, NIND, NOBS, J

PARAMETER (INTCEP=0, NIND=3, NOBS=4, LDX=NOBS, &

NCOEF=INTCEP+NIND)

INTEGER I, NOUT

REAL B(NCOEF), SQRT, SSE, SST, W(NOBS), X(LDX,NIND), &

Y(NOBS)

INTRINSIC SQRT

DATA (X(1,J),J=1,NIND)/1.0, -2.0, 0.0/, Y(1)/-3.0/

DATA (X(2,J),J=1,NIND)/1.0, -1.0, 2.0/, Y(2)/ 1.0/

DATA (X(3,J),J=1,NIND)/1.0, 2.0, 5.0/, Y(3)/ 2.0/

DATA (X(4,J),J=1,NIND)/1.0, 7.0, 3.0/, Y(4)/ 6.0/

DO 10 I=1, NOBS

! Assign weights

W(I) = 1.0/I**2

! Store square roots of weights

W(I) = SQRT(W(I))

10 CONTINUE

! Transform regressors

DO 20 J=1, NIND

CALL SHPROD (NOBS, W, 1, X(:,J), 1, X(:,J), 1)

20 CONTINUE

! Transform response

CALL SHPROD (NOBS, W, 1, Y, 1, Y, 1)

CALL RLSE (Y, X, B, INTCEP=INTCEP, SST=SST, SSE=SSE)

CALL WRRRN ('B', B)

CALL UMACH (2, NOUT)

WRITE (NOUT,*)

WRITE (NOUT,99999) 'SST = ', SST, ' SSE = ', SSE

99999 FORMAT (A7, F7.2, A7, F7.2)

END

Output

1 -1.431

2 0.658

3 0.748

SST = 11.94 SSE = 1.01