|
JMSLTM Numerical Library 5.0.1 | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.imsl.stat.StepwiseRegression
public class StepwiseRegression
Builds multiple linear regression models using forward selection, backward selection, or stepwise selection.
Class StepwiseRegression
builds a multiple linear regression
model using forward selection, backward selection, or forward stepwise (with
a backward glance) selection.
Levels of priority can be assigned to the candidate independent variables
using the setLevels(int[])
method. All variables with a priority level of
1 must enter the model before variables with a priority level of 2.
Similarly, variables with a level of 2 must enter before variables with a
level of 3, etc. Variables also can be forced into the model (setForce(int)
). Note that specifying "force" without also specifying the levels
will result in all variables being forced into the model.
Typically, the intercept is forced into all models and is not a candidate variable. In this case, a sum-of-squares and crossproducts matrix for the independent and dependent variables corrected for the mean is required. Other possibilities are as follows:
cov
.
Argument nObservations
must be set to one greater than
the number of observations.cov
. In this case, cov
contains one
additional row and column corresponding to the constant regressor.
This row/column contains the sum-of-squares and crossproducts of the
constant regressor with the independent and dependent variables. The
remaining elements in cov
are the same as in the
previous case. Argument nObservations
must be set to
one greater than the number of observations.The stepwise regression algorithm is due to Efroymson (1960).
StepwiseRegression
uses sweeps of the covariance matrix (input
in cov
, if the covariance matrix is specified, or generated
internally) to move variables in and out of the model (Hemmerle 1967,
Chapter 3). The SWEEP operator discussed in Goodnight (1979) is used. A
description of the stepwise algorithm is also given by Kennedy and Gentle
(1980, pp. 335-340). The advantage of stepwise model building over all
possible regression (SelectionRegression
) is that it is less
demanding computationally when the number of candidate independent variables
is very large. However, there is no guarantee that the model selected will
be the best model (highest ) for any subset size of
independent variables.
Nested Class Summary | |
---|---|
class |
StepwiseRegression.CoefficientTTests
CoefficientTTests contains statistics related to the
student-t test, for each regression coefficient. |
static class |
StepwiseRegression.CyclingIsOccurringException
Cycling is occurring. |
static class |
StepwiseRegression.NoVariablesEnteredException
No Variables can enter the model. |
Field Summary | |
---|---|
static int |
BACKWARD_REGRESSION
Indicates backward regression. |
static int |
FORWARD_REGRESSION
Indicates forward regression. |
static int |
STEPWISE_REGRESSION
Indicates stepwise regression. |
Constructor Summary | |
---|---|
StepwiseRegression(double[][] x,
double[] y)
Creates a new instance of StepwiseRegression . |
|
StepwiseRegression(double[][] x,
double[] y,
double[] weights)
Creates a new instance of weighted StepwiseRegression . |
|
StepwiseRegression(double[][] x,
double[] y,
double[] weights,
double[] frequencies)
Creates a new instance of weighted StepwiseRegression
using observation frequencies. |
|
StepwiseRegression(double[][] cov,
int nObservations)
Creates a new instance of StepwiseRegression from a user-supplied
variance-covariance matrix. |
Method Summary | |
---|---|
void |
compute()
Builds the multiple linear regression models using forward selection, backward selection, or stepwise selection. |
ANOVA |
getANOVA()
Get an analysis of variance table and related statistics. |
StepwiseRegression.CoefficientTTests |
getCoefficientTTests()
Returns the student-t test statistics for the regression coefficients. |
double[] |
getCoefficientVIF()
Returns the variance inflation factors for the final model in this invocation. |
double[][] |
getCovariancesSwept()
Returns the results after cov has been swept for the columns
corresponding to the variables in the model. |
double[] |
getHistory()
Returns the stepwise regression history for the independent variables. |
double[] |
getSwept()
Returns an array containing information indicating whether or not a particular variable is in the model. |
void |
setForce(int force)
Forces independent variables into the model based on their level assigned from setlevels . |
void |
setLevels(int[] levels)
Sets the levels of priority for variables entering and leaving the regression. |
void |
setMethod(int method)
Specifies the stepwise selection method, forward, backward, or stepwise Regression. |
void |
setPValueIn(double pValueIn)
Defines the largest p-value for variables entering the model. |
void |
setPValueOut(double pValueOut)
Defines the smallest p-value for removing variables. |
void |
setTolerance(double tolerance)
The tolerance used to detect linear dependence among the independent variables. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int BACKWARD_REGRESSION
pValueOut
. During initialization, all candidate independent
variables enter the model.
public static final int FORWARD_REGRESSION
pValueIn
. During intitialization, only forced variables
enter the model.
public static final int STEPWISE_REGRESSION
Constructor Detail |
---|
public StepwiseRegression(double[][] x, double[] y) throws Covariances.TooManyObsDeletedException, Covariances.MoreObsDelThanEnteredException, Covariances.DiffObsDeletedException
StepwiseRegression
.
x
- A double
matrix of nObs by nVars,
where nObs is the number of observations and nVars
is the number of independent variables.y
- A double
array containing the observations of
the dependent variable.
Covariances.TooManyObsDeletedException
- is thrown if more
observations have been deleted than were originally
entered, i.e. the sum of frequencies has become negative
Covariances.MoreObsDelThanEnteredException
- is thrown if
more observations are being deleted from
"variance-covariance" matrix than were originally
entered. The corresponding row,column of the incidence
matrix is less than zero.
Covariances.DiffObsDeletedException
- is thrown if different
observations are being deleted than were originally
enteredpublic StepwiseRegression(double[][] x, double[] y, double[] weights) throws Covariances.NonnegativeWeightException, Covariances.TooManyObsDeletedException, Covariances.MoreObsDelThanEnteredException, Covariances.DiffObsDeletedException
StepwiseRegression
.
x
- A double
matrix of nObs by nVars,
where nObs is the number of observations and nVars
is the number of independent variables.y
- A double
array containing the observations of
the dependent variable.weights
- A double
array containing the weight for
each observation of x
.
Covariances.NonnegativeWeightException
- is thrown if the
weights are negative
Covariances.TooManyObsDeletedException
- is thrown if more
observations have been deleted than were originally
entered, i.e. the sum of frequencies has become negative
Covariances.MoreObsDelThanEnteredException
- is thrown if
more observations are being deleted from
"variance-covariance" matrix than were originally
entered. The corresponding row,column of the incidence
matrix is less than zero.
Covariances.DiffObsDeletedException
- is thrown if different
observations are being deleted than were originally
enteredpublic StepwiseRegression(double[][] x, double[] y, double[] weights, double[] frequencies) throws Covariances.NonnegativeFreqException, Covariances.NonnegativeWeightException, Covariances.TooManyObsDeletedException, Covariances.MoreObsDelThanEnteredException, Covariances.DiffObsDeletedException
StepwiseRegression
using observation frequencies.
x
- A double
matrix of nObs by nVars,
where nObs is the number of observations and nVars
is the number of independent variables.y
- A double
array containing the observations of
the dependent variable.weights
- A double
array containing the weight for
each observation of x
.frequencies
- A double
array containing the frequency
for each row of x
.
Covariances.NonnegativeFreqException
- is thrown if the
frequencies are negative
Covariances.NonnegativeWeightException
- is thrown if the
weights are negative
Covariances.TooManyObsDeletedException
- is thrown if more
observations have been deleted than were originally
entered, i.e. the sum of frequencies has become negative
Covariances.MoreObsDelThanEnteredException
- is thrown if
more observations are being deleted from
"variance-covariance" matrix than were originally
entered. The corresponding row,column of the incidence
matrix is less than zero.
Covariances.DiffObsDeletedException
- is thrown if different
observations are being deleted than were originally
enteredpublic StepwiseRegression(double[][] cov, int nObservations)
StepwiseRegression
from a user-supplied
variance-covariance matrix.
cov
- A double
matrix containing a
variance-covariance or sum of squares and crossproducts
matrix, in which the last column must correspond to the
dependent variable. cov
can be computed using
the Covariances
class.nObservations
- An int
containing the number of
observations associated with cov
.Method Detail |
---|
public void compute() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
StepwiseRegression.NoVariablesEnteredException
- is thrown if no variables
entered the model. All elements of ANOVA
table
are set to NaN
StepwiseRegression.CyclingIsOccurringException
- is thrown if cycling occurspublic ANOVA getANOVA() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
ANOVA
table and related statistics.
StepwiseRegression.NoVariablesEnteredException
StepwiseRegression.CyclingIsOccurringException
public StepwiseRegression.CoefficientTTests getCoefficientTTests() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
StepwiseRegression.CoefficientTTests
object containing
statistics relating to the regression coefficients.
StepwiseRegression.NoVariablesEnteredException
StepwiseRegression.CyclingIsOccurringException
public double[] getCoefficientVIF() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
x
(or, if the covariance matrix is specified, the elements
are in the same order as the variables in cov
). Each element
corresponding to a variable not in the model contains statistics for a
model which includes the variables of the final model and the variables
corresponding to the element in question.
The square of the multiple correlation coefficient for the i-th regressor after all others can be obtained from the i-th element for the returned array by the following formula:
double
array containing the variance inflation
factors for the final model in this invocation.
StepwiseRegression.NoVariablesEnteredException
StepwiseRegression.CyclingIsOccurringException
public double[][] getCovariancesSwept() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
cov
has been swept for the columns
corresponding to the variables in the model.
double
matrix containing the results after
cov
has been swept on the columns corresponding to
the variables in the model. The estimated variance-covariance
matrix of the estimated regression coefficients in the final
model can be obtained by extracting the rows and columns
corresponding to the independent variables in the final model
and multiplying the elements of this matrix by the error mean
square.
StepwiseRegression.NoVariablesEnteredException
StepwiseRegression.CyclingIsOccurringException
public double[] getHistory() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
double
array containing the recent history of
the independent variables. The last element corresponds to the
dependent variable.
history[i] | Status of i-th Variable |
0.0 | This variable has never been added to the model. |
0.5 | This variable was added into the model during initialization. |
k 0.0 | This variable was added to the model during the k-th step. |
k 0.0 | This variable was deleted from model during the k-th step |
StepwiseRegression.NoVariablesEnteredException
StepwiseRegression.CyclingIsOccurringException
setLevels(int[])
public double[] getSwept() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringException
double
array with information to indicate the
independent variables in the model. The last element corresponds
to the dependent variable. A +1 in the i-th position
indicates that the variable is in the selected model. A -1
indicates that the variable is not in the selected model.
StepwiseRegression.NoVariablesEnteredException
StepwiseRegression.CyclingIsOccurringException
setLevels(int[])
public void setForce(int force)
setlevels
.
force
- An int
specifying the upper bound on the
variables forced into the model. Variables with
levels 1, 2, ..., force
are forced into
the model as independent variables.setLevels(int[])
public void setLevels(int[] levels)
levels[i]=0
means the i-th variable never
enters the model. Argument levels[i]=-1
means the
i-th variable is the dependent variable. The last element in
levels
must correspond to the dependent variable, except
when the variance-covariance or sum of squares and crossproducts matrix
is supplied.
levels
- An int
array containing the levels of entry
into the model for each variable.
Default: 1, 1, ..., 1, -1 where -1 corresponds to the
dependent variable.setForce(int)
public void setMethod(int method)
method
- An int
value between -1 and 1 specifying
the stepwise selection method. Fields
FORWARD_REGRESSION
, BACKWARD_REGRESSION
,
and STEPWISE_REGRESSION
should be
used. Default: STEPWISE_REGRESSION
.FORWARD_REGRESSION
,
BACKWARD_REGRESSION
,
STEPWISE_REGRESSION
public void setPValueIn(double pValueIn)
pValueIn
may enter
the model. Backward regression does not use this value.
pValueIn
- A double
containing the largest
p-value for variables entering the model.
Default: pValueIn
= 0.05.public void setPValueOut(double pValueOut)
pValueOut
may leave the
model. pValueOut
must be greater than or equal to
pValueIn
. A common choice for pValueOut
is
2*pValueIn
. Forward regression does not use this value.
pValueOut
- A double
containing the smallest
p-value for removing variables from the
model. Default: pValueOut
= 0.10.public void setTolerance(double tolerance)
tolerance
- A double
containing the tolerance used
for detecting linear dependence. Default:
tolerance
= 2.2204460492503e-16.
|
JMSLTM Numerical Library 5.0.1 | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |