GRGLM

Generates regressors for a general linear model.

Required Arguments

XNROW by NCOL matrix containing the data. (Input)

INDCL — Index vector of length NCLVAR containing the column numbers of X that are the classification variables. (Input)

NCLVAL — Vector of length NCLVAR containing the number of values taken on by each classification variable. (Input)
NCLVAL(I) is the number of distinct values for the I-th classification variable.

CLVAL — Vector of length NCLVAL(1) + NCLVAL(2) +  + NCLVAL(NCLVAR) containing the values of the classification variables. (Input)
The first NCLVAL(1) elements contain the values of the first classification variable; the next NCLVAL(2) elements contain the values of the second classification variable; and so on. The last NCLVAL(NCLVAR) elements contain the values of the last classification variable.

NVEF — Vector of length NEF containing the number of variables associated with each effect in the model. (Input)

INDEF — Index vector of length NVEF(1) + NVEF(2) +  + NVEF(NEF). (Input)
The first NVEF(1) elements give the column numbers of X for each variable in the first effect; the next NVEF(2) elements give the column numbers for each variable in the second effect; and so on. The last NVEF(NEF) elements give the column numbers for each variable in the last effect.

NREG — Number of columns in REG. (Output)

REGNROW by NREG matrix containing the regressor variables generated from the matrix X. (Output, if IDUMMY > 0)
Since, in general, NREG will not be known in advance, the user may need to invoke GRGLM first with IDUMMY < 0, dimension REG, and then invoke GRGLM with IDUMMY > 0.

Optional Arguments

NROW — Number of rows of data in X. (Input)
Default: NROW = size (X,1).

NCOL — Number of columns in X. (Input)
Default: NCOL = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).

NCLVAR — Number of classification variables. (Input)
Default: NCLVAR = size (INDCL,1).

NEF — Number of effects (sources of variation) in the model. (Input)
Default: NEF = size (NVEF,1).

IDUMMY — Dummy variable option. (Input)
Default: IDUMMY = 1.
Some indicator variables are defined for the I-th class variable as follows: Let J = NCLVAL(1) + NCLVAL(2) +  + NCLVAL(I  1). NCLVAL(I) indicator variables are defined such that for K = 1, 2, NCLVAL(I) the K-th indicator variable for observation number IOBS takes the value 1.0 if X(IOBSINDCL(I)) = CLVAL(J + K) and equals 0.0 otherwise. Dummy variables are generated from these indicator variables in one of the three following ways:

 

IDUMMY

Method

-1, 1

The NCLVAL(I) indicator variables are the dummy variables.

-2, 2

The first NCLVAL(I 1 indicator variables are the dummy variables. The last indicator variable is omitted.

-3, 3

The K-th indicator variable minus the NCLVAL(I)-th indicator variable is the K-th dummy variable (K = 1, 2, , NCLVAL(I) 1).

If IDUMMY < 0, only NREG is computed; and X, CLVAL, and REG are not referenced.

LDREG — Leading dimension of REG exactly as specified in the dimension statement in the calling program. (Input)
Default: LDREG = size (REG,1).

NRMISS — Number of rows of REG containing NaN (not a number). (Output)
A row of REG contains NaN for a regressor when any of the variables involved in generation of the regressor equals NaN or if a value of one of the classification variables in the model is not given by CLVAL.

FORTRAN 90 Interface

Generic: CALL GRGLM (X, INDCL, NCLVAL, CLVAL, NVEF, INDEF, NREG, REG [])

Specific: The specific interface names are S_GRGLM and D_GRGLM.

FORTRAN 77 Interface

Single: CALL GRGLM (NROW, NCOL, X, LDX, NCLVAR, INDCL, NCLVAL, CLVAL, NEF, NVEF, INDEF, IDUMMY, NREG, REG, LDREG, NRMISS)

Double: The double precision name is DGRGLM.

Description

Routine GRGLM generates regressors for a general linear model from a data matrix. The data matrix can contain classification variables as well as continuous variables.

Regressors for effects composed solely of continuous variables are generated as powers and crossproducts. Consider a data matrix containing continuous variables as columns 3 and 4. The effect indices (3,3) (stored in INDEF) generates a regressor whose i-th value is the square of the i‑th value in column 3. The effect indices (3,4) generates a regressor whose i-th value is the product of the i-th value in column 3 with the i‑th value in column 4.

Regressors for an effect (source of variation) composed of a single classification variable are generated using indicator variables. Let the classification variable A take on values a1a2an (stored in CLVAL). From this classification variable, GRGLM creates n indicator variables. For k = 1, 2, n we have

 

For each classification variable, another set of variables is created from the indicator variables. We call these new variables dummy variables. Dummy variables are generated from the indicator variables in one of three manners:

1. the dummies are the n indicator variables

2. the dummies are the first n  1 indicator variables

3. the n  1 dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients

In particular, for IDUMMY = 1, the dummy variables are Ak = Ik (k = 1, 2, n). For IDUMMY = 2, the dummy variables are Ak = Ik (k = 1, 2, n  1). For IDUMMY = 3, the dummy variables are Ak = Ik  In (k = 1, 2, n  1). The regressors generated for an effect composed of a single classification variable are the associated dummy variables.

Let mj be the number of dummies generated for the j-th classification variable. Suppose there are two classification variables A and B with dummies

 

respectively. The regressors generated for an effect composed of two classification variables A and B are

 

More generally, the regressors generated for an effect composed of several classification variables and several continuous variables are given by the Kronecker products of variables, where the order of the variables is specified in INDEF. Consider a data matrix containing classification variables in columns 1 and 2 and continuous variables in columns 3 and 4. Label these four columns ABX1, and X2. The regressors generated by the effect indices (1, 2, 3, 3, 4) is A  B  X1X1X2.

Comments

Let the data matrix X = (A, B, X1) where A and B are classification variables, and X1 is a continuous variable. The model containing the effects A, B, AB, X1, AX1, BX1 and ABX1 is specified as follows: NCLVAR = 2, INDCL = (1, 2), NEF = 7, NVEF = (1, 1, 2, 1, 2, 2, 3), and
INDEF = (1, 2, 1, 2, 3, 1, 3, 2, 3, 1, 2, 3).

For this model, suppose NCLVAL(1) = 2, NCLVAL(2) = 3, and CLVAL= (1.0, 2.0, 1.0, 2.0, 3.0). Let A1B1B2, and B3 be the associated indicator variables. Given below, for each IDUMMY option, are the regressors in their order of appearance in REG.

 

IDUMMY

REG

1

A1, A2, B1, B2, B3, A1B1, A1B2, A1B3, A2B1, A2B2, A2B3, X1, A1X1, A2X1, B1X1, B2X1, B3X1, A1B1X1, A1B2X1, A1B3X1, A2B1X1, A2B2X1, A2B3X1

2

A1, B1, B2, A1B1, A1B2, X1, A1X1, B1X1, B2X1, A1B1X1, A1B2X1

3

A1 A2, B1 B3, B2 B3, (A1 A2)(B1 B2), (A1 A2)(B2 B3), X1, (A1 A2)X1, (B1 B3)X1, (B2 B3)X1, (A1 A2)(B1 B2)X1, (A1 A2)(B2 B3)X1

Within a group of regressors corresponding to an interaction effect, the indicator variables composing the regressors vary most rapidly for the last classification variable, vary next most rapidly for the next to last classification variable, etc.

Example

In this example, regressors are generated for a two-way analysis-of-covariance model containing all the interaction terms. The model could be fitted by a subsequent invocation of routine RGIVN with INTCEP = 1. The regressors generated with the option IDUMMY = 2 are for the model whose mean function is

μ + α i+ βj+ γij+ δxij+ ζ ixij+ ηjxij+ θ ijxiji = 1, 2; j = 1, 2, 3

where α2 = β3 = γ13  = γ21 = γ22  = γ23 = ζ2 = η3 = θ13 = θ21 = θ22 = θ23 = 0.

 

USE GRGLM_INT

USE UMACH_INT

USE WRRRL_INT

 

IMPLICIT NONE

INTEGER LDREG, LDX, LINDEF, MAXCL, NCLVAR, NCOL, NDREG, NEF, &

NROW

PARAMETER (LINDEF=12, MAXCL=5, NCLVAR=2, NCOL=3, NDREG=20, &

NEF=7, NROW=6, LDREG=NROW, LDX=NROW)

!

INTEGER IDUMMY, INDCL(NCLVAR), INDEF(LINDEF), J, &

NCLVAL(NCLVAR), NOUT, NREG, NRMISS, NVEF(NEF)

REAL CLVAL(MAXCL), REG(LDREG,NDREG), X(LDX,NCOL)

CHARACTER CLABEL(12)*7, RLABEL(1)*7

!

DATA INDCL/1, 2/, NCLVAL/2, 3/, CLVAL/1.0, 2.0, 1.0, 2.0, 3.0/

DATA NVEF/1, 1, 2, 1, 2, 2, 3/, INDEF/1, 2, 1, 2, 3, 1, 3, 2, 3, &

1, 2, 3/

DATA (X(1,J),J=1,NCOL)/1.0, 1.0, 1.11/

DATA (X(2,J),J=1,NCOL)/1.0, 2.0, 2.22/

DATA (X(3,J),J=1,NCOL)/1.0, 3.0, 3.33/

DATA (X(4,J),J=1,NCOL)/2.0, 1.0, 4.44/

DATA (X(5,J),J=1,NCOL)/2.0, 2.0, 5.55/

DATA (X(6,J),J=1,NCOL)/2.0, 3.0, 6.66/

DATA RLABEL/'NUMBER'/, CLABEL/' ', 'ALPHA1', 'BETA1', &

'BETA2', 'GAMMA11', 'GAMMA12', 'DELTA', 'ZETA1', &

'ETA1', 'ETA2', 'THETA11', 'THETA12'/

!

IDUMMY = 2

CALL GRGLM (X, INDCL, NCLVAL, CLVAL, NVEF, INDEF, NREG, REG, &

IDUMMY=IDUMMY, NRMISS=NRMISS)

CALL UMACH (2, NOUT)

WRITE (NOUT,*) 'NREG = ', NREG, ' NRMISS = ', NRMISS

CALL WRRRL ('%/REG', REG, RLABEL, CLABEL, NROW, NREG, FMT='(F7.2)')

END

Output

 

NREG = 11 NRMISS = 0

 

REG

ALPHA1 BETA1 BETA2 GAMMA11 GAMMA12 DELTA ZETA1 ETA1

1 1.00 1.00 0.00 1.00 0.00 1.11 1.11 1.11

2 1.00 0.00 1.00 0.00 1.00 2.22 2.22 0.00

3 1.00 0.00 0.00 0.00 0.00 3.33 3.33 0.00

4 0.00 1.00 0.00 0.00 0.00 4.44 0.00 4.44

5 0.00 0.00 1.00 0.00 0.00 5.55 0.00 0.00

6 0.00 0.00 0.00 0.00 0.00 6.66 0.00 0.00

 

ETA2 THETA11 THETA12

1 0.00 1.11 0.00

2 2.22 0.00 2.22

3 0.00 0.00 0.00

4 0.00 0.00 0.00

5 5.55 0.00 0.00

6 0.00 0.00 0.00