|
JMSLTM Numerical Library 6.1 | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.imsl.stat.SelectionRegression
public class SelectionRegression
Selects the best multiple linear regression models.
Class SelectionRegression
finds the best subset regressions
for a regression problem with three or more independent variables.
Typically, the intercept is forced into all models and is not a candidate
variable. In this case, a sum of squares and crossproducts matrix for the
independent and dependent variables corrected for the mean is computed
internally. Optionally, SelectionRegression
supports
user-calculated sum-of-squares and crossproducts matrices; see the
description of the compute
method.
"Best" is defined by using one of the following three criteria:
Here, n is equal to the sum of the frequencies (or the number of
rows in x
if frequencies are not specified in the
compute
method), and is the
total sum of squares. k is the number of candidate or independent
variables, represented as the nCandidate
argument in the
SelectionRegression
constructor.
is the error sum of squares in a model containing p regression
parameters including (or p - 1 of the
k candidate variables). Variable
Class SelectionRegression
is based on the algorithm of
Furnival and Wilson (1974). This algorithm finds the maximum number of good
saved candidate regressions for each possible subset size. For more details,
see method setMaximumGoodSaved(int)
. These regressions are used to
identify a set of best regressions. In large problems, many regressions are
not computed. They may be rejected without computation based on results for
other subsets; this yields an efficient technique for considering all
possible regressions.
There are cases when the user may want to input the variance-covariance
matrix rather than allow it to be calculated. This can be accomplished
using the appropriate compute
method. Three situations in which
the user may want to do this are as follows:
nObservations
must be
set to 1 greater than the number of observations. Form
, where A = [A, Y], to compute the
raw sum of squares and crossproducts matrix.cov
. In this case, cov
contains one additional row
and column corresponding to the constant regressor. This row and
column contain the sum of squares and crossproducts of the constant
regressor with the independent and dependent variables. The
remaining elements in cov
are the same as in the
previous case. Argument nObservations
must be set to 1
greater than the number of observations.nObservations
must be set to m less than the
number of observations.SelectionRegression
can save considerable CPU time over
explicitly computing all possible regressions. However, the function has
some limitations that can cause unexpected results for users who are unaware
of the limitations of the software.
SelectionRegression
(for ) can produce
incorrect results.SelectionRegression
eliminates some subsets of
candidate variables by obtaining lower bounds on the error sum of
squares from fitting larger models. First, the full model containing
all independent variables is fit sequentially using a forward
stepwise procedure in which one variable enters the model at a time,
and criterion values and model numbers for all the candidate
variables that can enter at each step are stored. If linearly
dependent variables are removed from the full model, a
"VariablesDeleted" warning is issued. In this case,
some submodels that contain variables removed from the full model
because of linear dependency can be overlooked if they have not
already been identified during the initial forward stepwise
procedure. If this warning is issued and you want the variables that
were removed from the full model to be considered in smaller models,
you can rerun the program with a set of linearly independent
variables.
Nested Class Summary | |
---|---|
static class |
SelectionRegression.NoVariablesException
No Variables can enter the model. |
class |
SelectionRegression.Statistics
Statistics contains statistics related to the regression
coefficients. |
Field Summary | |
---|---|
static int |
ADJUSTED_R_SQUARED_CRITERION
Indicates (adjusted ) criterion regression. |
static int |
MALLOWS_CP_CRITERION
Indicates Mallow's criterion regression. |
static int |
R_SQUARED_CRITERION
Indicates criterion regression. |
Constructor Summary | |
---|---|
SelectionRegression(int nCandidate)
Constructs a new SelectionRegression object. |
Method Summary | |
---|---|
void |
compute(double[][] x,
double[] y)
Computes the best multiple linear regression models. |
void |
compute(double[][] x,
double[] y,
double[] weights)
Computes the best weighted multiple linear regression models. |
void |
compute(double[][] x,
double[] y,
double[] weights,
double[] frequencies)
Computes the best weighted multiple linear regression models using frequencies for each observation. |
void |
compute(double[][] cov,
int nObservations)
Computes the best multiple linear regression models using a user-supplied covariance matrix. |
int |
getCriterionOption()
Returns the criterion option used to calculate the regression estimates. |
SelectionRegression.Statistics |
getStatistics()
Returns a new Statistics object. |
void |
setCriterionOption(int criterionOption)
Sets the Criterion to be used. |
void |
setMaximumBestFound(int maxFound)
Sets the maximum number of best regressions to be found. |
void |
setMaximumGoodSaved(int maxSaved)
Sets the maximum number of good regressions for each subset size saved. |
void |
setMaximumSubsetSize(int maxSubset)
Sets the maximum subset size if criterion is used. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int ADJUSTED_R_SQUARED_CRITERION
public static final int MALLOWS_CP_CRITERION
public static final int R_SQUARED_CRITERION
Constructor Detail |
---|
public SelectionRegression(int nCandidate)
SelectionRegression
object.
nCandidate
- An int
containing the number of
candidate variables (independent variables).
nCandidate
must be greater than 2.Method Detail |
---|
public void compute(double[][] x, double[] y) throws SelectionRegression.NoVariablesException, com.imsl.stat.Covariances.TooManyObsDeletedException, com.imsl.stat.Covariances.MoreObsDelThanEnteredException, com.imsl.stat.Covariances.DiffObsDeletedException
x
- A double
matrix containing the observations of
the candidate (independent) variables. The number of columns
in x
must be equal to the number of variables
set in the constructor.y
- A double
array containing the observations of
the dependent variable.
SelectionRegression.NoVariablesException
- if no variables can enter any model
com.imsl.stat.Covariances.TooManyObsDeletedException
- more observations have been deleted than were originally entered
com.imsl.stat.Covariances.MoreObsDelThanEnteredException
- more observations are being deleted from the output covariance matrix than were originally entered
com.imsl.stat.Covariances.DiffObsDeletedException
- different observations are being deleted from return matrix than were originally enteredpublic void compute(double[][] x, double[] y, double[] weights) throws SelectionRegression.NoVariablesException, Covariances.NonnegativeWeightException, com.imsl.stat.Covariances.TooManyObsDeletedException, com.imsl.stat.Covariances.MoreObsDelThanEnteredException, com.imsl.stat.Covariances.DiffObsDeletedException
x
- A double
matrix containing the observations of
the candidate (independent) variables. The number of columns
in x
must be equal to the number of variables
set in the constructor.y
- A double
array containing the observations of
the dependent variable.weights
- A double
array containing the weight for
each of the observations.
SelectionRegression.NoVariablesException
- if no variables can enter any model
Covariances.NonnegativeWeightException
- weights must be nonnegative
com.imsl.stat.Covariances.TooManyObsDeletedException
- more observations have been deleted than were originally entered
com.imsl.stat.Covariances.MoreObsDelThanEnteredException
- more observations are being deleted from the output covariance matrix than were originally entered
com.imsl.stat.Covariances.DiffObsDeletedException
- different observations are being deleted from return matrix than were originally enteredpublic void compute(double[][] x, double[] y, double[] weights, double[] frequencies) throws SelectionRegression.NoVariablesException, Covariances.NonnegativeFreqException, Covariances.NonnegativeWeightException, com.imsl.stat.Covariances.TooManyObsDeletedException, com.imsl.stat.Covariances.MoreObsDelThanEnteredException, com.imsl.stat.Covariances.DiffObsDeletedException
x
- A double
matrix containing the observations of
the candidate (independent) variables. The number of columns
in x
must be equal to the number of variables
set in the constructor.y
- A double
array containing the observations of
the dependent variable.weights
- A double
array containing the weight for
each of the observations.frequencies
- A double
array containing the frequency
for each of the observations of x
.
SelectionRegression.NoVariablesException
- if no variables can enter any model
Covariances.NonnegativeFreqException
- frequencies must be nonnegative
Covariances.NonnegativeWeightException
- weights must be nonnegative
com.imsl.stat.Covariances.TooManyObsDeletedException
- more observations have been deleted than were originally entered
com.imsl.stat.Covariances.MoreObsDelThanEnteredException
- more observations are being deleted from the output covariance matrix than were originally entered
com.imsl.stat.Covariances.DiffObsDeletedException
- different observations are being deleted from return matrix than were originally enteredpublic void compute(double[][] cov, int nObservations) throws SelectionRegression.NoVariablesException
cov
- A double
matrix containing a
variance-covariance or sum of squares and crossproducts
matrix, in which the last column must correspond to the
dependent variable. cov
can be computed using
the Covariances class.nObservations
- An int
containing the number of
observations used to compute cov
.
SelectionRegression.NoVariablesException
- if no variables can enter any modelpublic int getCriterionOption()
int
containing the criterion option.R_SQUARED_CRITERION
,
ADJUSTED_R_SQUARED_CRITERION
,
MALLOWS_CP_CRITERION
public SelectionRegression.Statistics getStatistics()
Statistics
object.
Statistics
object containing the Coefficient
statistics.public void setCriterionOption(int criterionOption)
nCandidate
are considered. However,
for the maximum number of subsets can be
restricted to maxSubset
in the setMaximumSubsetSize(int)
method.
Criterion Option | Description |
R_SQUARED_CRITERION | For , subset sizes
1, 2, ..., maxSubset are examined. This is the
default with maxSubset = nCandidate .
|
ADJUSTED_R_SQUARED_CRITERION | For Adjusted ,
subset sizes 1, 2, ..., nCandidate are
examined. |
MALLOWS_CP_CRITERION | For Mallow's Subset
sizes 1, 2, ..., nCandidate are
examined. |
criterionOption
- An int
containing the criterion
option used for the best subset regression
selection.R_SQUARED_CRITERION
,
ADJUSTED_R_SQUARED_CRITERION
,
MALLOWS_CP_CRITERION
public void setMaximumBestFound(int maxFound)
If the
criterion option is selected, the
maxFound
best regressions for each subset size examined are
reported. If the adjusted or Mallow's
criteria are selected, the
maxFound
among all possible regressions are found.
maxFound
- An int
containing the maximum number of
best regressions to be reported. Default:
maxFound
= 1.R_SQUARED_CRITERION
,
ADJUSTED_R_SQUARED_CRITERION
,
MALLOWS_CP_CRITERION
public void setMaximumGoodSaved(int maxSaved)
Argument maxSaved
must be greater than or equal to
maxFound
. Normally, maxSaved
should be less
than or equal to 10. It should never need be larger than
maxSubset
, the maximum number of subsets for any subset
size. Computing time required is inversely related to
maxSaved
.
maxSaved
- An int
containing the maximum number of
good regressions saved for each subset size.
Default: maxSaved
= maximum(10,
maxSubset
).public void setMaximumSubsetSize(int maxSubset)
maxSubset
- An int
containing the maximum subset
size when criterion is
used. Default: maxSubset
=
nCandidate
.R_SQUARED_CRITERION
,
ADJUSTED_R_SQUARED_CRITERION
,
MALLOWS_CP_CRITERION
|
JMSLTM Numerical Library 6.1 | |||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |