public class RegressorsForGLM extends Object implements Serializable
Class RegressorsForGLM
generates regressors for a general linear model from a data matrix.
The data matrix can contain classification variables as well as continuous variables.
Regressors for effects composed solely of continuous variables are generated as powers and crossproducts.
Consider a data matrix containing continuous variables as Columns 3 and 4.
The effect indices (3, 3) generate a regressor whose ith value is the square of the ith value in Column 3.
The effect indices (3, 4) generates a regressor whose ith value is the product of the ith value in Column 3
with the ith value in Column 4.
Regressors for an effect (source of variation) composed of a single classification variable
are generated using indicator variables.
Let the classification variable A take on values .
From this classification variable, RegressorsForGLM
creates n indicator variables.
For , we have
ALL
,
the dummy variables are
.
For dummy method LEAVE_OUT_LAST
,
the dummy variables are
.
For dummy method SUM_TO_ZERO
,
the dummy variables are
.
The regressors generated for an effect composed of a singleclassification variable are the associated dummy variables.
Let be the number of dummies generated for the jth classification variable. Suppose there are two classification variables A and B with dummies
and The regressors generated for an effect composed of two classification variables A and B areMore generally, the regressors generated for an effect composed of several classification variables
and several continuous variables are given by the Kronecker products of variables,
where the order of the variables is specified in setEffects
.
Consider a data matrix containing classification variables in Columns 0 and 1
and continuous variables in Columns 2 and 3. Label these four columns
, , , and .
The regressors generated by the effect indices
are
Let the data matrix ,
where A and B are classification variables and
is a continuous variable.
The model containing the effects
, B, AB, , ,
, and
is specified by setting nClassVariables
=2 in the constructor and
calling setEffects(effects)
, with
int effects[][] = { {0}, {1}, {0, 1}, {2}, {0, 2}, {1, 2}, {0, 1, 2} };
For this model, suppose that variable A has two levels, and , and that variable B has three levels, , , and . For each dummy method option, the regressors in their order of appearance in regressors are given below.
dummyMethod 
Regressors 
ALL 
, , , , , , , , , , , , , , , , 
LEAVE_OUT_LAST 
, , , , , , , , , , 
SUM_TO_ZERO 
, , , , , , , , , 
By default, RegressorsForGLM
internally generates values for effects
which correspond to a first order model with
nEffects
= nContinuousVariables
+
nClassVariables
, where nContinuousVariables
is the
number of continuous variables and nClassVariables
is the number
of classification variables. The variables then are used to create the
regressor variables. The effects are ordered such that the first effect
corresponds to the first column of x
, the second effect
corresponds to the second column of x
, etc. A second order model
corresponding to the columns (variables) of x
is generated if
setModelOrder(2)
is used.
The effects array for a first or second order model can be obtained by first using
setModelOrder
followed by getEffects
. This array can then
be modified and used as the argument to setEffects
. This may be an easier way of
setting the effects for an almost linear or quadratic model than creating the effects array
from scratch.
There are
effects, wherenVar
= nClassVariables
+nContinuousVariables
.
The first nVar
effects correspond to the columns of x
,
such that the first effect corresponds to the first column of x
,
the second effect corresponds to the second column of x
, ...,
the nVar
th effect corresponds to the nVar
th column
of x
(i.e. x[nVar1]
).
The next nContinuousVariables
effects correspond to squares of the continuous variables.
The last
effects correspond to the twovariable interactions.
setEffects
.Modifier and Type  Field and Description 

static int 
ALL
The n indicator variables are the dummy variables.

static int 
LEAVE_OUT_LAST
The dummies are the first n1 indicator variables.

static int 
SUM_TO_ZERO
The dummies are defined in terms of the indicator variables
so that for balanced data, the usual summation restrictions are
imposed on the regression coefficients.

Constructor and Description 

RegressorsForGLM(double[][] x,
int nClassVariables)
Constructor where the class columns are the first columns.

RegressorsForGLM(double[][] x,
int[] classColumns)
Constructor with an explicit set of class column indicies.

Modifier and Type  Method and Description 

int 
getDummyMethod()
Returns the dummy method.

int[][] 
getEffects()
Returns the effects.

int[][] 
getEffectsColumns()
Returns a mapping of effects to regressor columns.

int 
getNumberOfMissingRows()
Returns the number of rows in the regressors matrix containing
NaN (not a number). 
int 
getNumberOfRegressors()
Returns the number regressors.

double[][] 
getRegressors()
Returns the regressor array.

void 
setDummyMethod(int dummyMethod)
Sets the dummy method.

void 
setEffects(int[][] effects)
Set the effects.

void 
setModelOrder(int modelOrder)
Sets the order of the model.

public static final int ALL
public static final int LEAVE_OUT_LAST
public static final int SUM_TO_ZERO
public RegressorsForGLM(double[][] x, int nClassVariables)
x
 is an nObservations
by
nClassVariables
+nContinuousVariables
array containing the data, where nObservations
is the number
of observations. The columns must be ordered such that the first
nClassVariables
columns contain the class variables and the
next nContinuousVariables
columns contain the continuous
variables.nClassVariables
 is number of class variables.
The number of continuous variables is assumed to be the number of
columns in x
nClassVariables
.public RegressorsForGLM(double[][] x, int[] classColumns)
x
 is an nObservations
by
nClassVariables
+nContinuousVariables
array containing the data.
The columns containing the class variables are specified by
classColumns
.classColumns
 is an array containing the columns indices,
in x
, of the class variables.public int getDummyMethod()
ALL
(the default),
LEAVE_OUT_LAST
or
SUM_TO_ZERO
.public int[][] getEffects()
x
.public int[][] getEffectsColumns()
int
array.
The number of rows is equal to the number of effects.
Each row contains the column numbers of the regressor matrix
into which the corresponding effect is mapped.public int getNumberOfMissingRows()
NaN
(not a number).
A row of the regressors matrix contains NaN
for a regressor when any of
the variables involved in generation of the regressor
equals NaN
or if a value of one of the classification
variables in the model is not given by effects.public int getNumberOfRegressors()
public double[][] getRegressors()
public void setDummyMethod(int dummyMethod)
dummyMethod
 must be one of
ALL
(the default),
LEAVE_OUT_LAST
or
SUM_TO_ZERO
.public void setEffects(int[][] effects)
effects
 is a jagged array.
The number of rows in the matrix is the number of effects.
For each row, the values are the 0based column numbers of x
.public void setModelOrder(int modelOrder)
setEffects
to specify more complicated models.
This overrides previously set effects.modelOrder
 is one or two. The default effects are
equivalent to model equal to one.Copyright © 19702015 Rogue Wave Software
Built October 13 2015.