com.imsl.stat.RegressorsForGLM

All Implemented Interfaces:: Serializable

public class RegressorsForGLM extends Object implements Serializable

Generates regressors for a general linear model.

Class RegressorsForGLM generates regressors for a general linear model from a data matrix. The data matrix can contain classification variables as well as continuous variables. Regressors for effects composed solely of continuous variables are generated as powers and crossproducts. Consider a data matrix containing continuous variables as Columns 3 and 4. The effect indices (3, 3) generate a regressor whose i-th value is the square of the i-th value in Column 3. The effect indices (3, 4) generates a regressor whose i-th value is the product of the i-th value in Column 3 with the i-th value in Column 4.

Regressors for an effect (source of variation) composed of a single classification variable are generated using indicator variables. Let the classification variable A take on values $a_1, a_2, \ldots, a_n$. From this classification variable, RegressorsForGLM creates n indicator variables. For $k = 1, 2, \ldots, n$, we have $$ I_k = \left\{ \begin{array}{rl} 1 & \mbox{if } A = a_k \\ 0 & \mbox{otherwise} \end{array} \right. $$ For each classification variable, another set of variables is created from the indicator variables. These new variables are called dummy variables. Dummy variables are generated from the indicator variables in one of three manners:

The dummies are the n indicator variables.
The dummies are the first $n-1$ indicator variables.
The $n-1$ dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients.

In particular, for dummy method ALL, the dummy variables are $A_k = I_k \: (k = 1, 2, \ldots, n)$. For dummy method LEAVE_OUT_LAST, the dummy variables are $A_k = I_k \: (k = 1, 2, ..., n - 1)$. For dummy method SUM_TO_ZERO, the dummy variables are $A_k = I_k - I_n \: (k = 1, 2, \ldots, n - 1)$. The regressors generated for an effect composed of a single-classification variable are the associated dummy variables.

Let $m_j$ be the number of dummies generated for the j-th classification variable. Suppose there are two classification variables A and B with dummies $$A_1, A_2, \ldots, A_{m_1} $$ and $$B_1, B_2, \ldots, B_{m_2} $$ The regressors generated for an effect composed of two classification variables A and B are $$ \begin{array}{rl} A \otimes B = & (A_1, A_2, \ldots, A_{m_1}) \otimes (B_1, B_2, \ldots, B_{m_2}) \\ = & (A_1 B_1, A_1 B_2, \ldots, A_1 B_{m_2}, A_2, B_1, A_2 B_2, \ldots, \\ = & A_2 B_{m_2}, \ldots, A_{m_1}, B_1, A_{m_1}, B_2, \ldots, A_{m_1} B_{m_2}) \end{array} $$

More generally, the regressors generated for an effect composed of several classification variables and several continuous variables are given by the Kronecker products of variables, where the order of the variables is specified in setEffects. Consider a data matrix containing classification variables in Columns 0 and 1 and continuous variables in Columns 2 and 3. Label these four columns $A$, $B$, $X_1$, and $X_2$. The regressors generated by the effect indices $(0, 1, 2, 2, 3)$ are $A \otimes B \otimes X_1 X_1 X_2$

.

Remarks

Let the data matrix $\mathtt{x} = (A, B, X_1)$, where A and B are classification variables and $X_1$ is a continuous variable. The model containing the effects $A$, B, AB, $X_1$, $A X_1$, $B X_1$, and $A B X_1$ is specified by setting nClassVariables=2 in the constructor and calling setEffects(effects), with

 int effects[][] = { {0}, {1}, {0, 1}, {2}, {0, 2}, {1, 2}, {0, 1, 2} };

For this model, suppose that variable A has two levels, $A_1$ and $A_2$, and that variable B has three levels, $B_1$, $B_2$, and $B_3$. For each dummy method option, the regressors in their order of appearance in regressors are given below.

`dummyMethod`	Regressors
`ALL`	$A_1$, $A_2$, $B_1$, $B_2$, $B_3$, $A_1 B_1, A_1 B_2$, $A_1 B_3, A_2 B_1, A_2 B_2$, $A_2 B_3, X1, A_1 X_1, A_2 X_1$, $B_1 X_1$, $B_2 X_1$, $B_3 X_1$, $A_1 B_1 X_1$, $A_1 B_2 X_1$, $A_1 B_3 X_1$, $A_2 B_1 X_1$, $A_2 B_2 X_1$, $A_2 B_3 X_1$
`LEAVE_OUT_LAST`	$A_1$, $B_1$, $B_2$, $A_1 B_1$, $A_1 B_2$, $X_1$, $A_1 X_1$, $B_1 X_1$, $B_2 X_1$, $A_1 B_1 X_1$, $A_1 B_2 X_1$
`SUM_TO_ZERO`	$A_1 - A_2$, $B_1 - B_3$, $B_2 - B_3$, $(A_1 - A_2) (B_1 - B_2), (A_1 - A_2) (B_2 - B_3)$, $X_1$, $(A_1 - A_2) X_1$, $(B_1 - B_3) X_1$, $(B_2 - B_3) X_1$, $(A_1 - A_2) (B_1 - B_2) X_1$, $(A_1 - A_2) (B_2 - B_3 )X_1$

Within a group of regressors corresponding to an interaction effect, the indicator variables composing the regressors vary most rapidly for the last classification variable, next most rapidly for the next to last classification variable, etc.

By default, RegressorsForGLM internally generates values for effects which correspond to a first order model with nEffects = nContinuousVariables + nClassVariables, where nContinuousVariables is the number of continuous variables and nClassVariables is the number of classification variables. The variables then are used to create the regressor variables. The effects are ordered such that the first effect corresponds to the first column of x, the second effect corresponds to the second column of x, etc. A second order model corresponding to the columns (variables) of x is generated if setModelOrder(2) is used.

The effects array for a first or second order model can be obtained by first using setModelOrder followed by getEffects. This array can then be modified and used as the argument to setEffects. This may be an easier way of setting the effects for an almost linear or quadratic model than creating the effects array from scratch.

There are $$ \mathtt{nEffects} = \mathtt{nClassVariables} + \mathtt{nContinuousVariables} + \frac{\mathtt{nVar} (\mathtt{nVar} - 1)}{2} $$ effects, where nVar = nClassVariables+nContinuousVariables. The first nVar effects correspond to the columns of x, such that the first effect corresponds to the first column of x, the second effect corresponds to the second column of x, ..., the nVar-th effect corresponds to the nVar-th column of x (i.e. x[nVar-1]). The next nContinuousVariables effects correspond to squares of the continuous variables. The last $\mathtt{nVar} (\mathtt{nVar} - 1) / 2$ effects correspond to the two-variable interactions.

Let the data matrix $\mathtt{x} = (A, B, X_1)$, where A and B are classification variables and $X_1$ is a continuous variable. The effects generated and order of appearance is $$ A,\: B,\: X_1,\: X_1^2,\: A B,\: A X_1,\: B X_1 $$
Let the data matrix $\mathtt{x} = (A, X_1, X_2)$, where A is a classification variable and $X_1$ and $X_2$ are continuous variables. The effects generated and order of appearance is $$ A,\: X_1,\: X_2,\: X_1^2,\: X_2^2,\: A X_1,\: A X_2,\: X_1 X_2 $$
Let the data matrix $\mathtt{x} = (X_1, A, X_2)$, where A is a classification variable and $X_1$ and $X_2$ are continuous variables. The effects generated and order of appearance is $$ X_1,\: A,\: X_2,\: X_1^2,\: X_2^2,\: X_1 A,\: X_1 X_2,\: A X_2 $$

Higher-order and more complicated models can be specified using setEffects.

Author:

brophy

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

static final int

ALL

The n indicator variables are the dummy variables.

static final int

LEAVE_OUT_LAST

The dummies are the first n-1 indicator variables.

static final int

SUM_TO_ZERO

The $n-1$ dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients.
Constructor Summary

Constructors

Constructor

Description

RegressorsForGLM(double[][] x, int nClassVariables)

Constructor where the class columns are the first columns.

RegressorsForGLM(double[][] x, int[] classColumns)

Constructor with an explicit set of class column indicies.
Method Summary

Modifier and Type

Method

Description

int

getDummyMethod()

Returns the dummy method.

int[][]

getEffects()

Returns the effects.

int[][]

getEffectsColumns()

Returns a mapping of effects to regressor columns.

int

getNumberOfMissingRows()

Returns the number of rows in the regressors matrix containing NaN (not a number).

int

getNumberOfRegressors()

Returns the number regressors.

double[][]

getRegressors()

Returns the regressor array.

void

setDummyMethod(int dummyMethod)

Sets the dummy method.

void

setEffects(int[][] effects)

Set the effects.

void

setModelOrder(int modelOrder)

Sets the order of the model.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- ALL
  
  public static final int ALL
  
  The n indicator variables are the dummy variables.
  See Also:
  
  Constant Field Values
- LEAVE_OUT_LAST
  
  public static final int LEAVE_OUT_LAST
  
  The dummies are the first n-1 indicator variables.
  See Also:
  
  Constant Field Values
- SUM_TO_ZERO
  
  public static final int SUM_TO_ZERO
  
  The $n-1$ dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients.
  See Also:
  
  Constant Field Values
Constructor Details
- RegressorsForGLM
  
  public RegressorsForGLM(double[][] x, int nClassVariables)
  
  Constructor where the class columns are the first columns.
  
  Parameters:
  
  x - is an nObservations by nClassVariables+nContinuousVariables array containing the data, where nObservations is the number of observations. The columns must be ordered such that the first nClassVariables columns contain the class variables and the next nContinuousVariables columns contain the continuous variables.
  
  nClassVariables - is number of class variables. The number of continuous variables is assumed to be the number of columns in x-nClassVariables.
- RegressorsForGLM
  
  public RegressorsForGLM(double[][] x, int[] classColumns)
  
  Constructor with an explicit set of class column indicies.
  
  Parameters:
  
  x - is an nObservations by nClassVariables+nContinuousVariables array containing the data. The columns containing the class variables are specified by classColumns.
  
  classColumns - is an array containing the columns indices, in x, of the class variables.
Method Details
- setDummyMethod
  
  public void setDummyMethod(int dummyMethod)
  
  Sets the dummy method.
  
  Parameters:
  
  dummyMethod - must be one of ALL (the default), LEAVE_OUT_LAST or SUM_TO_ZERO.
- getDummyMethod
  
  public int getDummyMethod()
  
  Returns the dummy method.
  
  Returns:
  
  One of ALL (the default), LEAVE_OUT_LAST or SUM_TO_ZERO.
- getNumberOfMissingRows
  
  public int getNumberOfMissingRows()
  
  Returns the number of rows in the regressors matrix containing NaN (not a number). A row of the regressors matrix contains NaN for a regressor when any of the variables involved in generation of the regressor equals NaN or if a value of one of the classification variables in the model is not given by effects.
  
  Returns:
  
  The number of rows in the data matrix having missing data.
- setModelOrder
  
  public void setModelOrder(int modelOrder)
  
  Sets the order of the model. Model order can be specified as 1 or 2. Use setEffects to specify more complicated models. This overrides previously set effects.
  
  Parameters:
  
  modelOrder - is one or two. The default effects are equivalent to model equal to one.
- setEffects
  
  public void setEffects(int[][] effects)
  
  Set the effects. This overrides any previously set model order.
  
  Parameters:
  
  effects - is a jagged array. The number of rows in the matrix is the number of effects. For each row, the values are the 0-based column numbers of x.
- getEffects
  
  public int[][] getEffects()
  
  Returns the effects.
  
  Returns:
  
  a jagged array containing the effects. The number of rows in the matrix is the number of effects. For each row, the values are the 0-based column numbers of x.
- getEffectsColumns
  
  public int[][] getEffectsColumns()
  
  Returns a mapping of effects to regressor columns.
  
  Returns:
  
  A jagged int array. The number of rows is equal to the number of effects. Each row contains the column numbers of the regressor matrix into which the corresponding effect is mapped.
- getNumberOfRegressors
  
  public int getNumberOfRegressors()
  
  Returns the number regressors.
  
  Returns:
  
  The number of regressors. This is the number of columns in the regressor matrix.
- getRegressors
  
  public double[][] getRegressors()
  
  Returns the regressor array.
  
  Returns:
  
  An array of size number of observations by number of regresssors.

`dummyMethod`	Regressors
`ALL`	\(A_1\), \(A_2\), \(B_1\), \(B_2\), \(B_3\), \(A_1 B_1, A_1 B_2\), \(A_1 B_3, A_2 B_1, A_2 B_2\), \(A_2 B_3, X1, A_1 X_1, A_2 X_1\), \(B_1 X_1\), \(B_2 X_1\), \(B_3 X_1\), \(A_1 B_1 X_1\), \(A_1 B_2 X_1\), \(A_1 B_3 X_1\), \(A_2 B_1 X_1\), \(A_2 B_2 X_1\), \(A_2 B_3 X_1\)
`LEAVE_OUT_LAST`	\(A_1\), \(B_1\), \(B_2\), \(A_1 B_1\), \(A_1 B_2\), \(X_1\), \(A_1 X_1\), \(B_1 X_1\), \(B_2 X_1\), \(A_1 B_1 X_1\), \(A_1 B_2 X_1\)
`SUM_TO_ZERO`	\(A_1 - A_2\), \(B_1 - B_3\), \(B_2 - B_3\), \((A_1 - A_2) (B_1 - B_2), (A_1 - A_2) (B_2 - B_3)\), \(X_1\), \((A_1 - A_2) X_1\), \((B_1 - B_3) X_1\), \((B_2 - B_3) X_1\), \((A_1 - A_2) (B_1 - B_2) X_1\), \((A_1 - A_2) (B_2 - B_3 )X_1\)

Class RegressorsForGLM

Remarks

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

ALL

LEAVE_OUT_LAST

SUM_TO_ZERO

Constructor Details

RegressorsForGLM

RegressorsForGLM

Method Details

setDummyMethod

getDummyMethod

getNumberOfMissingRows

setModelOrder

setEffects

getEffects

getEffectsColumns

getNumberOfRegressors

getRegressors