Example 1: SelectionRegression

This example uses a data set from Draper and Smith (1981, pp. 629-630). Class SelectionRegression is invoked to find the best regression for each subset size using the R^2 criterion.


import java.text.*;
import com.imsl.stat.*;
import com.imsl.math.*;

public class SelectionRegressionEx1 {

    public static void main(String[] args) throws Exception {
        double x[][] = {
            {7., 26., 6., 60.},
            {1., 29., 15., 52.},
            {11., 56., 8., 20.},
            {11., 31., 8., 47.},
            {7., 52., 6., 33.},
            {11., 55., 9., 22.},
            {3., 71., 17., 6.},
            {1., 31., 22., 44.},
            {2., 54., 18., 22.},
            {21., 47., 4., 26},
            {1., 40., 23., 34.},
            {11., 66., 9., 12.},
            {10.0, 68., 8., 12.}
        };

        double y[] = {
            78.5, 74.3, 104.3, 87.6,
            95.9, 109.2, 102.7, 72.5,
            93.1, 115.9, 83.8, 113.3, 109.4
        };

        String criterionOption;
        MessageFormat critMsg
                = new MessageFormat("Regressions with {0} variable(s) ({1})");
        MessageFormat critLabel
                = new MessageFormat("   Criterion               Variables");
        MessageFormat coefMsg
                = new MessageFormat("Best Regressions with {0}"
                        + " variable(s) ({1})");
        MessageFormat coefLabel = new MessageFormat("Variable   Coefficient"
                + "   Standard Error  t-statistic   p-value");

        SelectionRegression sr = new SelectionRegression(4);
        sr.compute(x, y);
        SelectionRegression.Statistics stats
                = sr.getStatistics();

        criterionOption = "R-squared";

        for (int i = 1; i <= 4; i++) {
            double[] tmpCrit = stats.getCriterionValues(i);
            int[][] indvar = stats.getIndependentVariables(i);

            Object p[] = {new Integer(i), criterionOption};
            System.out.println(critMsg.format(p));
            Object p1[] = {null};
            System.out.println(critLabel.format(p1));

            for (int j = 0; j < tmpCrit.length; j++) {
                System.out.print("     " + tmpCrit[j] + "        ");
                for (int k = 0; k < indvar[j].length; k++) {
                    System.out.print(indvar[j][k] + "   ");
                }
                System.out.println("");
            }
            System.out.println("");
        }

        for (int i = 0; i < 4; i++) {
            System.out.println("");
            Object p[] = {new Integer(i + 1), criterionOption};
            System.out.println(coefMsg.format(p));
            Object p2[] = {null};
            System.out.println(coefLabel.format(p2));

            double[][] tmpCoef = stats.getCoefficientStatistics(i);
            PrintMatrix pm = new PrintMatrix();
            pm.setColumnSpacing(10);
            PrintMatrixFormat tst = new PrintMatrixFormat();
            tst.setNoColumnLabels();
            tst.setNoRowLabels();
            pm.print(tst, tmpCoef);
            System.out.println();
            System.out.println();
        }
    }
}

Output

Regressions with 1 variable(s) (R-squared)
   Criterion               Variables
     67.45419641316094        4   
     66.62682576332938        2   
     53.39480238350332        1   
     28.58727312298116        3   

Regressions with 2 variable(s) (R-squared)
   Criterion               Variables
     97.86783745356314        1   2   
     97.24710477169312        1   4   
     93.52896406158074        3   4   
     68.00604079500502        2   4   
     54.81667488448575        1   3   

Regressions with 3 variable(s) (R-squared)
   Criterion               Variables
     98.23354512004263        1   2   4   
     98.22846792190859        1   2   3   
     98.12810925873434        1   3   4   
     97.28199593862728        2   3   4   

Regressions with 4 variable(s) (R-squared)
   Criterion               Variables
     98.23756204076797        1   2   3   4   


Best Regressions with 1 variable(s) (R-squared)
Variable   Coefficient   Standard Error  t-statistic   p-value
                                                                                   
4          -0.738          0.155          -4.775          0.001          




Best Regressions with 2 variable(s) (R-squared)
Variable   Coefficient   Standard Error  t-statistic   p-value
                                                                              
1          1.468          0.121          12.105          0          
2          0.662          0.046          14.442          0          




Best Regressions with 3 variable(s) (R-squared)
Variable   Coefficient   Standard Error  t-statistic   p-value
                                                                                   
1           1.452          0.117          12.41           0              
2           0.416          0.186           2.242          0.052          
4          -0.237          0.173          -1.365          0.205          




Best Regressions with 4 variable(s) (R-squared)
Variable   Coefficient   Standard Error  t-statistic   p-value
                                                                                   
1           1.551          0.745           2.083          0.071          
2           0.51           0.724           0.705          0.501          
3           0.102          0.755           0.135          0.896          
4          -0.144          0.709          -0.203          0.844          



Link to Java source.