Package com.imsl.stat

Class StepwiseRegression

java.lang.Object
com.imsl.stat.StepwiseRegression
All Implemented Interfaces:
Serializable, Cloneable

public class StepwiseRegression extends Object implements Serializable, Cloneable
Builds multiple linear regression models using forward selection, backward selection, or stepwise selection.

Class StepwiseRegression builds a multiple linear regression model using forward selection, backward selection, or forward stepwise (with a backward glance) selection.

Levels of priority can be assigned to the candidate independent variables using the setLevels(int[]) method. All variables with a priority level of 1 must enter the model before variables with a priority level of 2. Similarly, variables with a level of 2 must enter before variables with a level of 3, etc. Variables also can be forced into the model (setForce(int)). Note that specifying "force" without also specifying the levels will result in all variables being forced into the model.

Typically, the intercept is forced into all models and is not a candidate variable. In this case, a sum-of-squares and crossproducts matrix for the independent and dependent variables corrected for the mean is required. Other possibilities are as follows:

  1. The intercept is not in the model. A raw (uncorrected) sum-of-squares and crossproducts matrix for the independent and dependent variables is required as input in cov. Argument nObservations must be set to one greater than the number of observations.
  2. An intercept is a candidate variable. A raw (uncorrected) sum-of-squares and crossproducts matrix for the constant regressor (=1), independent and dependent variables are required for cov. In this case, cov contains one additional row and column corresponding to the constant regressor. This row/column contains the sum-of-squares and crossproducts of the constant regressor with the independent and dependent variables. The remaining elements in cov are the same as in the previous case. Argument nObservations must be set to one greater than the number of observations.

The stepwise regression algorithm is due to Efroymson (1960). StepwiseRegression uses sweeps of the covariance matrix (input in cov, if the covariance matrix is specified, or generated internally) to move variables in and out of the model (Hemmerle 1967, Chapter 3). The SWEEP operator discussed in Goodnight (1979) is used. A description of the stepwise algorithm is also given by Kennedy and Gentle (1980, pp. 335-340). The advantage of stepwise model building over all possible regression (SelectionRegression) is that it is less demanding computationally when the number of candidate independent variables is very large. However, there is no guarantee that the model selected will be the best model (highest \(R^2\)) for any subset size of independent variables.

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    class 
    CoefficientTTests() contains statistics related to the student-t test, for each regression coefficient.
    static class 
    Cycling is occurring.
    static class 
    No Variables can enter the model.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    Indicates backward regression.
    static final int
    Indicates forward regression.
    static final int
    Indicates stepwise regression.
  • Constructor Summary

    Constructors
    Constructor
    Description
    StepwiseRegression(double[][] x, double[] y)
    Creates a new instance of StepwiseRegression.
    StepwiseRegression(double[][] x, double[] y, double[] weights)
    Creates a new instance of weighted StepwiseRegression.
    StepwiseRegression(double[][] x, double[] y, double[] weights, double[] frequencies)
    Creates a new instance of weighted StepwiseRegression using observation frequencies.
    StepwiseRegression(double[][] cov, int nObservations)
    Creates a new instance of StepwiseRegression from a user-supplied variance-covariance matrix.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Builds the multiple linear regression models using forward selection, backward selection, or stepwise selection.
    Gets an analysis of variance table and related statistics.
    Returns the student-t test statistics for the regression coefficients.
    double[]
    Returns the variance inflation factors for the final model in this invocation.
    double[][]
    Returns the results after cov has been swept for the columns corresponding to the variables in the model.
    double[]
    Returns the stepwise regression history for the independent variables.
    double
    Returns the intercept.
    double[]
    Returns an array containing information indicating whether or not a particular variable is in the model.
    void
    setForce(int force)
    Forces independent variables into the model based on their level assigned from setlevels(int[]).
    void
    setLevels(int[] levels)
    Sets the levels of priority for variables entering and leaving the regression.
    void
    setMeans(double[] means)
    Sets the means of the variables.
    void
    setMethod(int method)
    Specifies the stepwise selection method, forward, backward, or stepwise Regression.
    void
    setPValueIn(double pValueIn)
    Defines the largest p-value for variables entering the model.
    void
    setPValueOut(double pValueOut)
    Defines the smallest p-value for removing variables.
    void
    setTolerance(double tolerance)
    The tolerance used to detect linear dependence among the independent variables.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • FORWARD_REGRESSION

      public static final int FORWARD_REGRESSION
      Indicates forward regression. An attempt is made to add a variable to the model. A variable is added if its p-value is less than pValueIn. During intitialization, only forced variables enter the model.
      See Also:
    • BACKWARD_REGRESSION

      public static final int BACKWARD_REGRESSION
      Indicates backward regression. An attempt is made to remove a variable from the model. A variable is removed if its p-value is greater than pValueOut. During initialization, all candidate independent variables enter the model.
      See Also:
    • STEPWISE_REGRESSION

      public static final int STEPWISE_REGRESSION
      Indicates stepwise regression. A backward step is attempted. After the backward step, a forward step is attempted. This is a stepwise step. Any forced variables enter the model during initialization.
      See Also:
  • Constructor Details

  • Method Details

    • compute

      Builds the multiple linear regression models using forward selection, backward selection, or stepwise selection.
      Throws:
      StepwiseRegression.NoVariablesEnteredException - is thrown if no variables entered the model. All elements of ANOVA table are set to NaN.
      StepwiseRegression.CyclingIsOccurringException - is thrown if cycling occurs
    • setPValueIn

      public void setPValueIn(double pValueIn)
      Defines the largest p-value for variables entering the model.

      Variables with p-value less than pValueIn may enter the model. Backward regression does not use this value.

      Parameters:
      pValueIn - a double containing the largest p-value for variables entering the model

      Default: pValueIn = 0.05

    • setPValueOut

      public void setPValueOut(double pValueOut)
      Defines the smallest p-value for removing variables.

      Variables with p-values greater than pValueOut may leave the model. pValueOut must be greater than or equal to pValueIn. A common choice for pValueOut is 2*pValueIn. Forward regression does not use this value.

      Parameters:
      pValueOut - a double containing the smallest p-value for removing variables from the model

      Default: pValueOut = 0.10

    • setTolerance

      public void setTolerance(double tolerance)
      The tolerance used to detect linear dependence among the independent variables.
      Parameters:
      tolerance - a double containing the tolerance used for detecting linear dependence

      Default: tolerance = 2.2204460492503e-16

    • getCoefficientTTests

      Returns the student-t test statistics for the regression coefficients.

      Each row corresponding to a variable not in the model contains statistics for a model which includes the variables of the final model and the variable corresponding to the row in question.

      Returns:
      a StepwiseRegression.CoefficientTTests object containing statistics relating to the regression coefficients
      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException
    • getANOVA

      Gets an analysis of variance table and related statistics.
      Returns:
      an ANOVA table and related statistics
      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException
    • setLevels

      public void setLevels(int[] levels)
      Sets the levels of priority for variables entering and leaving the regression.

      Each variable is assigned a positive value which indicates its level of entry into the model. A variable can enter the model only after all variables with smaller nonzero levels of entry have entered. Similarly, a variable can only leave the model after all variables with higher levels of entry have left. Variables with the same level of entry compete for entry (deletion) at each step. Argument levels[i]=0 means the i-th variable never enters the model. Argument levels[i]=-1 means the i-th variable is the dependent variable. The last element in levels must correspond to the dependent variable, except when the variance-covariance or sum of squares and crossproducts matrix is supplied.

      Parameters:
      levels - an int array containing the levels of entry into the model for each variable

      Default: 1, 1, ..., 1, -1 where -1 corresponds to the dependent variable.

      See Also:
    • setForce

      public void setForce(int force)
      Forces independent variables into the model based on their level assigned from setlevels(int[]).
      Parameters:
      force - an int specifying the upper bound on the variables forced into the model

      Variables with levels 1, 2, ..., force are forced into the model as independent variables.

      See Also:
    • setMethod

      public void setMethod(int method)
      Specifies the stepwise selection method, forward, backward, or stepwise Regression.
      Parameters:
      method - an int value between -1 and 1 specifying the stepwise selection method

      Fields FORWARD_REGRESSION, BACKWARD_REGRESSION , and STEPWISE_REGRESSION should be used. Default: STEPWISE_REGRESSION.

      See Also:
    • setMeans

      public void setMeans(double[] means)
      Sets the means of the variables.

      This is required when the covariance array is input and the intercept getIntercept() is requested. Otherwise, it is not used.

      Parameters:
      means - a double array of length nVars+1, where nVars is the number of independent variables. means[0] through means[nVars-1] are the means of the independent variables and means[nVars] is the mean of the dependent variable.
      See Also:
    • getCoefficientVIF

      Returns the variance inflation factors for the final model in this invocation.

      The elements are in the same order as the independent variables in x (or, if the covariance matrix is specified, the elements are in the same order as the variables in cov). Each element corresponding to a variable not in the model contains statistics for a model which includes the variables of the final model and the variables corresponding to the element in question.

      The square of the multiple correlation coefficient for the i-th regressor after all others can be obtained from the i-th element for the returned array by the following formula:

      $$1.0-\frac{1.0}{VIF}$$
      Returns:
      a double array containing the variance inflation factors for the final model in this invocation
      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException
    • getSwept

      Returns an array containing information indicating whether or not a particular variable is in the model.
      Returns:
      a double array with information to indicate the independent variables in the model

      The last element corresponds to the dependent variable. A +1 in the i-th position indicates that the variable is in the selected model. A -1 indicates that the variable is not in the selected model.

      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException
      See Also:
    • getHistory

      Returns the stepwise regression history for the independent variables.
      Returns:
      a double array containing the recent history of the independent variables. The last element corresponds to the dependent variable.

      history[i] Status of i-th Variable
      0.0This variable has never been added to the model.
      0.5This variable was added to the model during initialization.
      k \(\gt\) 0.0 This variable was added to the model during the k-th step.
      k \(\lt\) 0.0 This variable was deleted from the model during the k-th step.

      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException
      See Also:
    • getIntercept

      Returns the intercept.

      The intercept is computed as follows:

      $$ \beta_0 = \bar{y} - \sum_{i=1}^{n} \beta_i \bar{x}_{i-1} $$

      where \(\bar{y}\) is the mean of the dependent variable y, \(\beta_i\) are the coefficients, and \(\bar{x}_i\) are the mean values for each independent variable \(x_i\) in the final model. If the covariance matrix is used for input, use method setMean() to specify the means of the variables. If x and y are used for input, the means are computed internally and do not need to be specified.

      Returns:
      a double containing the intercept
      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException
    • getCovariancesSwept

      Returns the results after cov has been swept for the columns corresponding to the variables in the model.
      Returns:
      a double matrix containing the results after cov has been swept on the columns corresponding to the variables in the model

      The estimated variance-covariance matrix of the estimated regression coefficients in the final model can be obtained by extracting the rows and columns corresponding to the independent variables in the final model and multiplying the elements of this matrix by the error mean square.

      Throws:
      StepwiseRegression.NoVariablesEnteredException
      StepwiseRegression.CyclingIsOccurringException