Class StepwiseRegression
- All Implemented Interfaces:
Serializable,Cloneable
Class StepwiseRegression builds a multiple linear regression
model using forward selection, backward selection, or forward stepwise (with
a backward glance) selection.
Levels of priority can be assigned to the candidate independent variables
using the setLevels(int[]) method. All variables with a priority level of
1 must enter the model before variables with a priority level of 2.
Similarly, variables with a level of 2 must enter before variables with a
level of 3, etc. Variables also can be forced into the model (setForce(int)). Note that specifying "force" without also specifying the levels
will result in all variables being forced into the model.
Typically, the intercept is forced into all models and is not a candidate variable. In this case, a sum-of-squares and crossproducts matrix for the independent and dependent variables corrected for the mean is required. Other possibilities are as follows:
- The intercept is not in the model. A raw (uncorrected)
sum-of-squares and crossproducts matrix for the independent and
dependent variables is required as input in
cov. ArgumentnObservationsmust be set to one greater than the number of observations. - An intercept is a candidate variable. A raw (uncorrected)
sum-of-squares and crossproducts matrix for the constant regressor
(=1), independent and dependent variables are required for
cov. In this case,covcontains one additional row and column corresponding to the constant regressor. This row/column contains the sum-of-squares and crossproducts of the constant regressor with the independent and dependent variables. The remaining elements incovare the same as in the previous case. ArgumentnObservationsmust be set to one greater than the number of observations.
The stepwise regression algorithm is due to Efroymson (1960).
StepwiseRegression uses sweeps of the covariance matrix (input
in cov, if the covariance matrix is specified, or generated
internally) to move variables in and out of the model (Hemmerle 1967,
Chapter 3). The SWEEP operator discussed in Goodnight (1979) is used. A
description of the stepwise algorithm is also given by Kennedy and Gentle
(1980, pp. 335-340). The advantage of stepwise model building over all
possible regression (SelectionRegression) is that it is less
demanding computationally when the number of candidate independent variables
is very large. However, there is no guarantee that the model selected will
be the best model (highest \(R^2\)) for any subset size of
independent variables.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassCoefficientTTests()contains statistics related to the student-t test, for each regression coefficient.static classCycling is occurring.static classNo Variables can enter the model. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intIndicates backward regression.static final intIndicates forward regression.static final intIndicates stepwise regression. -
Constructor Summary
ConstructorsConstructorDescriptionStepwiseRegression(double[][] x, double[] y) Creates a new instance ofStepwiseRegression.StepwiseRegression(double[][] x, double[] y, double[] weights) Creates a new instance of weightedStepwiseRegression.StepwiseRegression(double[][] x, double[] y, double[] weights, double[] frequencies) Creates a new instance of weightedStepwiseRegressionusing observation frequencies.StepwiseRegression(double[][] cov, int nObservations) Creates a new instance ofStepwiseRegressionfrom a user-supplied variance-covariance matrix. -
Method Summary
Modifier and TypeMethodDescriptionvoidcompute()Builds the multiple linear regression models using forward selection, backward selection, or stepwise selection.getANOVA()Gets an analysis of variance table and related statistics.Returns the student-t test statistics for the regression coefficients.double[]Returns the variance inflation factors for the final model in this invocation.double[][]Returns the results aftercovhas been swept for the columns corresponding to the variables in the model.double[]Returns the stepwise regression history for the independent variables.doubleReturns the intercept.double[]getSwept()Returns an array containing information indicating whether or not a particular variable is in the model.voidsetForce(int force) Forces independent variables into the model based on their level assigned fromsetlevels(int[]).voidsetLevels(int[] levels) Sets the levels of priority for variables entering and leaving the regression.voidsetMeans(double[] means) Sets the means of the variables.voidsetMethod(int method) Specifies the stepwise selection method, forward, backward, or stepwise Regression.voidsetPValueIn(double pValueIn) Defines the largest p-value for variables entering the model.voidsetPValueOut(double pValueOut) Defines the smallest p-value for removing variables.voidsetTolerance(double tolerance) The tolerance used to detect linear dependence among the independent variables.
-
Field Details
-
FORWARD_REGRESSION
public static final int FORWARD_REGRESSIONIndicates forward regression. An attempt is made to add a variable to the model. A variable is added if its p-value is less thanpValueIn. During intitialization, only forced variables enter the model.- See Also:
-
BACKWARD_REGRESSION
public static final int BACKWARD_REGRESSIONIndicates backward regression. An attempt is made to remove a variable from the model. A variable is removed if its p-value is greater thanpValueOut. During initialization, all candidate independent variables enter the model.- See Also:
-
STEPWISE_REGRESSION
public static final int STEPWISE_REGRESSIONIndicates stepwise regression. A backward step is attempted. After the backward step, a forward step is attempted. This is a stepwise step. Any forced variables enter the model during initialization.- See Also:
-
-
Constructor Details
-
StepwiseRegression
public StepwiseRegression(double[][] x, double[] y) throws Covariances.TooManyObsDeletedException, Covariances.MoreObsDelThanEnteredException, Covariances.DiffObsDeletedException Creates a new instance ofStepwiseRegression.- Parameters:
x- adoublematrix of nObs by nVars, where nObs is the number of observations and nVars is the number of independent variablesy- adoublearray containing the observations of the dependent variable- Throws:
Covariances.TooManyObsDeletedException- is thrown if more observations have been deleted than were originally entered, i.e. the sum of frequencies has become negativeCovariances.MoreObsDelThanEnteredException- is thrown if more observations are being deleted from "variance-covariance" matrix than were originally entered. The corresponding row, column of the incidence matrix is less than zero.Covariances.DiffObsDeletedException- is thrown if different observations are being deleted than were originally entered
-
StepwiseRegression
public StepwiseRegression(double[][] x, double[] y, double[] weights) throws Covariances.NonnegativeWeightException, Covariances.TooManyObsDeletedException, Covariances.MoreObsDelThanEnteredException, Covariances.DiffObsDeletedException Creates a new instance of weightedStepwiseRegression.- Parameters:
x- adoublematrix of nObs by nVars, where nObs is the number of observations and nVars is the number of independent variablesy- adoublearray containing the observations of the dependent variableweights- adoublearray containing the weight for each observation ofx- Throws:
Covariances.NonnegativeWeightException- is thrown if the weights are negativeCovariances.TooManyObsDeletedException- is thrown if more observations have been deleted than were originally entered, i.e. the sum of frequencies has become negativeCovariances.MoreObsDelThanEnteredException- is thrown if more observations are being deleted from "variance-covariance" matrix than were originally entered. The corresponding row, column of the incidence matrix is less than zeroCovariances.DiffObsDeletedException- is thrown if different observations are being deleted than were originally entered
-
StepwiseRegression
public StepwiseRegression(double[][] x, double[] y, double[] weights, double[] frequencies) throws Covariances.NonnegativeFreqException, Covariances.NonnegativeWeightException, Covariances.TooManyObsDeletedException, Covariances.MoreObsDelThanEnteredException, Covariances.DiffObsDeletedException Creates a new instance of weightedStepwiseRegressionusing observation frequencies.- Parameters:
x- adoublematrix of nObs by nVars, where nObs is the number of observations and nVars is the number of independent variablesy- adoublearray containing the observations of the dependent variableweights- adoublearray containing the weight for each observation ofxfrequencies- adoublearray containing the frequency for each row ofx- Throws:
Covariances.NonnegativeFreqException- is thrown if the frequencies are negativeCovariances.NonnegativeWeightException- is thrown if the weights are negativeCovariances.TooManyObsDeletedException- is thrown if more observations have been deleted than were originally entered, i.e. the sum of frequencies has become negativeCovariances.MoreObsDelThanEnteredException- is thrown if more observations are being deleted from "variance-covariance" matrix than were originally entered. The corresponding row, column of the incidence matrix is less than zeroCovariances.DiffObsDeletedException- is thrown if different observations are being deleted than were originally entered
-
StepwiseRegression
public StepwiseRegression(double[][] cov, int nObservations) Creates a new instance ofStepwiseRegressionfrom a user-supplied variance-covariance matrix.- Parameters:
cov- adoublematrix containing a variance-covariance or sum of squares and crossproducts matrix, in which the last column must correspond to the dependent variable.covcan be computed using theCovariancesclass.nObservations- anintcontaining the number of observations associated withcov.
-
-
Method Details
-
compute
public void compute() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionBuilds the multiple linear regression models using forward selection, backward selection, or stepwise selection.- Throws:
StepwiseRegression.NoVariablesEnteredException- is thrown if no variables entered the model. All elements ofANOVAtable are set toNaN.StepwiseRegression.CyclingIsOccurringException- is thrown if cycling occurs
-
setPValueIn
public void setPValueIn(double pValueIn) Defines the largest p-value for variables entering the model.Variables with p-value less than
pValueInmay enter the model. Backward regression does not use this value.- Parameters:
pValueIn- adoublecontaining the largest p-value for variables entering the modelDefault:
pValueIn= 0.05
-
setPValueOut
public void setPValueOut(double pValueOut) Defines the smallest p-value for removing variables.Variables with p-values greater than
pValueOutmay leave the model.pValueOutmust be greater than or equal topValueIn. A common choice forpValueOutis 2*pValueIn. Forward regression does not use this value.- Parameters:
pValueOut- adoublecontaining the smallest p-value for removing variables from the modelDefault:
pValueOut= 0.10
-
setTolerance
public void setTolerance(double tolerance) The tolerance used to detect linear dependence among the independent variables.- Parameters:
tolerance- adoublecontaining the tolerance used for detecting linear dependenceDefault:
tolerance= 2.2204460492503e-16
-
getCoefficientTTests
public StepwiseRegression.CoefficientTTests getCoefficientTTests() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionReturns the student-t test statistics for the regression coefficients.Each row corresponding to a variable not in the model contains statistics for a model which includes the variables of the final model and the variable corresponding to the row in question.
- Returns:
- a
StepwiseRegression.CoefficientTTestsobject containing statistics relating to the regression coefficients - Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException
-
getANOVA
public ANOVA getANOVA() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionGets an analysis of variance table and related statistics.- Returns:
- an
ANOVAtable and related statistics - Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException
-
setLevels
public void setLevels(int[] levels) Sets the levels of priority for variables entering and leaving the regression.Each variable is assigned a positive value which indicates its level of entry into the model. A variable can enter the model only after all variables with smaller nonzero levels of entry have entered. Similarly, a variable can only leave the model after all variables with higher levels of entry have left. Variables with the same level of entry compete for entry (deletion) at each step. Argument
levels[i]=0means the i-th variable never enters the model. Argumentlevels[i]=-1means the i-th variable is the dependent variable. The last element inlevelsmust correspond to the dependent variable, except when the variance-covariance or sum of squares and crossproducts matrix is supplied.- Parameters:
levels- anintarray containing the levels of entry into the model for each variableDefault: 1, 1, ..., 1, -1 where -1 corresponds to the dependent variable.
- See Also:
-
setForce
public void setForce(int force) Forces independent variables into the model based on their level assigned fromsetlevels(int[]).- Parameters:
force- anintspecifying the upper bound on the variables forced into the modelVariables with levels 1, 2, ...,
forceare forced into the model as independent variables.- See Also:
-
setMethod
public void setMethod(int method) Specifies the stepwise selection method, forward, backward, or stepwise Regression.- Parameters:
method- anintvalue between -1 and 1 specifying the stepwise selection methodFields
FORWARD_REGRESSION,BACKWARD_REGRESSION, andSTEPWISE_REGRESSIONshould be used. Default:STEPWISE_REGRESSION.- See Also:
-
setMeans
public void setMeans(double[] means) Sets the means of the variables.This is required when the covariance array is input and the intercept
getIntercept()is requested. Otherwise, it is not used.- Parameters:
means- adoublearray of length nVars+1, where nVars is the number of independent variables.means[0]throughmeans[nVars-1]are the means of the independent variables andmeans[nVars]is the mean of the dependent variable.- See Also:
-
getCoefficientVIF
public double[] getCoefficientVIF() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionReturns the variance inflation factors for the final model in this invocation.The elements are in the same order as the independent variables in
x(or, if the covariance matrix is specified, the elements are in the same order as the variables incov). Each element corresponding to a variable not in the model contains statistics for a model which includes the variables of the final model and the variables corresponding to the element in question.The square of the multiple correlation coefficient for the i-th regressor after all others can be obtained from the i-th element for the returned array by the following formula:
$$1.0-\frac{1.0}{VIF}$$- Returns:
- a
doublearray containing the variance inflation factors for the final model in this invocation - Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException
-
getSwept
public double[] getSwept() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionReturns an array containing information indicating whether or not a particular variable is in the model.- Returns:
- a
doublearray with information to indicate the independent variables in the modelThe last element corresponds to the dependent variable. A +1 in the i-th position indicates that the variable is in the selected model. A -1 indicates that the variable is not in the selected model.
- Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException- See Also:
-
getHistory
public double[] getHistory() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionReturns the stepwise regression history for the independent variables.- Returns:
- a
doublearray containing the recent history of the independent variables. The last element corresponds to the dependent variable.history[i] Status of i-th Variable 0.0 This variable has never been added to the model. 0.5 This variable was added to the model during initialization. k \(\gt\) 0.0 This variable was added to the model during the k-th step. k \(\lt\) 0.0 This variable was deleted from the model during the k-th step. - Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException- See Also:
-
getIntercept
public double getIntercept() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionReturns the intercept.The intercept is computed as follows:
$$ \beta_0 = \bar{y} - \sum_{i=1}^{n} \beta_i \bar{x}_{i-1} $$
where \(\bar{y}\) is the mean of the dependent variabley, \(\beta_i\) are the coefficients, and \(\bar{x}_i\) are the mean values for each independent variable \(x_i\) in the final model. If the covariance matrix is used for input, use methodsetMean()to specify the means of the variables. Ifxandyare used for input, the means are computed internally and do not need to be specified.- Returns:
- a
doublecontaining the intercept - Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException
-
getCovariancesSwept
public double[][] getCovariancesSwept() throws StepwiseRegression.NoVariablesEnteredException, StepwiseRegression.CyclingIsOccurringExceptionReturns the results aftercovhas been swept for the columns corresponding to the variables in the model.- Returns:
- a
doublematrix containing the results aftercovhas been swept on the columns corresponding to the variables in the modelThe estimated variance-covariance matrix of the estimated regression coefficients in the final model can be obtained by extracting the rows and columns corresponding to the independent variables in the final model and multiplying the elements of this matrix by the error mean square.
- Throws:
StepwiseRegression.NoVariablesEnteredExceptionStepwiseRegression.CyclingIsOccurringException
-