Computes statistics related to a regression fit given the coefficient estimates and the R matrix.
Required Arguments
IRBEF — Index vector of length ∣IEF∣ + 1. (Input, if IEF is positive.) For i = 1, 2, …, ∣IEF∣, element numbers IRBEF(i), IRBEF(i) + 1, …, IRBEF(i + 1) ‑ 1, of B correspond to the i-th effect.
B — Vector of length NCOEF containing a least-squares solution for the regression coefficients. (Input) Here, if IEF > 0, then NCOEF = IRBEF(IEF + 1) ‑ 1; and if IEF≤ 0, then NCOEF = INTCEP‑IEF. If INTCEP = 1, then B(1) must be the estimated intercept.
R — NCOEF by NCOEF upper triangular matrix containing the R matrix. (Input) The R matrix can come from a regression fit based on a QR decomposition of the matrix of regressors or based on a Cholesky factorization RTR of the matrix of sums of squares and crossproducts of the regressors. Elements to the right of a diagonal element of R that is zero must also be zero. A zero row indicates a nonfull rank model. For an R matrix that comes from a regression fit with linear equality restrictions on the parameters, each row of R corresponding to a restriction must have a corresponding diagonal element that is negative. The remaining rows of R must have positive diagonal elements. Only the upper triangle of R is referenced.
DFE — Degrees of freedom for error. (Input)
SSE — Sum of squares for error. (Input)
AOV — Vector of length 15 containing statistics relating to the analysis of variance. (Output)
I
AOV(I)
1
Degrees of freedom for regression
2
Degrees of freedom for error
3
Total degrees of freedom
4
Sum of squares for regression
5
Sum of squares for error
6
Total sum of squares
7
Regression mean square
8
Error mean square
9
F-statistic
10
p‑value
11
R2 (in percent)
12
Adjusted R2 (in percent)
13
Estimated standard deviation of the model error
14
Mean of the response (dependent) variable
15
Coefficient of variation (in percent)
If INTCEP = 1, the regression and total are corrected for the mean. If INTCEP = 0, the regression and total are not corrected for the mean, and AOV(14) and AOV(15) are set to NaN (not a number).
SQSS — ∣IEF∣ by 4 matrix containing in columns 1 through 4 the sequential degrees of freedom, sum of squares, F-statistic, and p‑value. (Output) Each row corresponds to an effect. If IEF = 0, SQSS is not referenced and can be a vector of length one.
COEF — NCOEF by 5 matrix containing statistics relating to the regression coefficients. (Output) Each row corresponds to a coefficient in the model. Row INTCEP + I corresponds to the coefficient for the I-th independent variable. If INTCEP = 1, the first row corresponds to the intercept. The statistics in the columns are
Col.
Description
1
Coefficient estimate.
2
Estimated standard error of the coefficient estimate.
3
t-statistic for the test that the coefficient is zero.
4
p‑value for the two-sided t test.
5
Variance inflation factors. The square of the multiple correlation coefficient for the I-th regressor after all others can be obtained from COEF(I, 5) by the formula 1.0 ‑ 1.0/COEF(I, 5). If INTCEP = 0 or INTCEP = 1 and I = 1, the “multiple correlation coefficient” is not adjusted for the mean.
COVB — NCOEF by NCOEF matrix that is the estimated variance-covariance matrix of the estimated regression coefficients when R is nonsingular and is from an unrestricted regression fit. (Output) See Comments for an explanation of COVB when R is singular or R is from a restricted regression fit. If R is not needed, COVB and R can share the same storage locations.
IEF — Effect option. (Input) Default: IEF = 0. The absolute value of IEF is the number of effects (sources of variation) in the model excluding the error. The sign of IEF specifies the following options:
IEF
Meaning
< 0
Each effect corresponds to a single regressor (coefficient) in the model.
> 0
Each effect corresponds to one or more regressors. The association between the effects and the regressors is given by elements of IRBEF.
0
There are no effects in the model. INTCEP must equal 1.
LDR — Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input) Default: LDR = size (R,1).
PRINT — Printing option. (Input) Default: PRINT = ‘N’. PRINT is a character string indicating what is to be printed. The PRINT string is composed of one character print codes to control printing. These print codes are given as follows:
PRINT(I:I)
Printing that occurs
‘A’
All
‘N’
None
‘1’
AOV
‘2’
SQSS
‘3’
COEF
‘4’
COVB
The concatenated print codes ‘A’, ‘N’, ‘1’, …, ‘4’ that comprise the PRINT string give the combination of statistics to be printed. Here are a few examples.
PRINT
Printing that occurs
‘A’
All
‘N’
None
‘13’
AOV and COEF
‘124’
AOV, SQSS, and COVB
LDSQSS — Leading dimension of SQSS exactly as specified in the dimension statement in the calling program. (Input) Default: LDSQSS = size (SQSS,1).
LDCOEF — Leading dimension of COEF exactly as specified in the dimension statement in the calling program. (Input) Default: LDCOEF = size (COEF,1).
LDCOVB — Leading dimension of COVB exactly as specified in the dimension statement in the calling program. (Input) Default: LDCOVB = size (COVB,1).
Routine RSTAT computes summary statistics from a fitted general linear model. The model is y=Xβ+ɛ where y is the n× 1 vector of responses, X is the n×p matrix of regressors, β is the p× 1 vector of regression coefficients, and ɛ is the n× 1 vector of errors whose elements are each independently distributed with mean 0 and variance σ2. Routine RGIVN or routine RGLM can be used to compute the fit of the model. Next, RSTAT uses the results of this fit to compute summary statistics, including analysis of variance, sequential sum of squares, t tests, and estimated variance-covariance matrix of the estimated regression coefficients.
Some generalizations of the general linear model are allowed. If the i‑th element of ɛ has variance σ2/wi and the weights wi are used in the fit of the model, RSTAT produces summary statistics from the weighted least-squares fit. More generally, if the variance-covariance matrix of ɛ is σ2V, RSTAT can be used to produce summary statistics from the generalized least-squares fit. (Routine RGIVN can be used to perform a generalized least-squares fit, by regressing y* on X* where y* = (T−1)Ty, X* = (T−1)TX and T satisfies TTT = V. Routines for computing y* and X* can be found in the IMSL MATH/LIBRARY.)
If the general linear model has the restriction Hβ = g on the regression parameters, and this restriction is used in the fit of the model by routine RLEQU, RSTAT produces summary statistics from this restricted least-squares fit.
The sequential sum of squares for the i-th regression parameter is given by
The regression sum of squares is given by the sum of the sequential sums of squares. If an intercept is in the model, the regression sum of squares is adjusted for the mean, i.e.,
is not included in the sum.
The estimate of σ2 is s2 (stored in AOV(8)) that is computed as SSE/DFE.
If R is nonsingular, the estimated variance-covariance matrix of (stored in COVB) is computed by s2R−1(R−1)T.
If R is singular, corresponding to rank (X) < p, a generalized inverse is used. For a matrix G to be a gi(i = 1, 2, 3, or 4) inverse of a matrix A, G must satisfy conditions j (for j≤i) for the Moore-Penrose inverse but generally must fail conditions k (for k > i). The four conditions for G to be a Moore-Penrose inverse of A are as follows:
1. AGA = A
2. GAG = G
3. AG is symmetric
4. GA is symmetric
In the case where R is singular, the method for obtaining COVB follows the discussion of Maindonald (1984, pages 101‑103). Let Z be the diagonal matrix with diagonal elements defined by
Let G be the solution to RG = Z obtained by setting the i-th ({i : rii = 0}) row of G to zero. COVB is set to s2GGT. (G is a g3 inverse of R. For any g3 inverse of R, represented by
the result
is a symmetric g2 inverse of RTR = XTX. See Sallas and Lionti [1988].)
Note that COVB can only be used to get variances and covariances of estimable functions of the regression coefficients, i.e., nonestimable functions (linear combinations of the regression coefficients not in the space spanned by the nonzero rows of R) must not be used. See, for example, Maindonald (1984, pages 166‑168) for a discussion of estimable functions.
The estimated standard errors of the estimated regression coefficients (stored in column 2 of COEF) are computed as square roots of the corresponding diagonal entries in COVB.
For the case where an intercept is in the model, put
equal to the matrix R with the first row and column deleted. Generally, the variance inflation factor (VIF) for the i-th regression coefficient is computed as the product of the i-th diagonal element of RTR and the i-th diagonal element of its computed inverse. If an intercept is in the model, the VIF for those coefficients not corresponding to the intercept uses the diagonal elements of
(see Maindonald 1984, page 40).
The preceding discussion can be modified to include the restricted least-squares problem. The modification is based on the work of Stirling (1981). Let the matrix D = diag(d1, d2, …, dp) be a diagonal matrix with elements di = 0 if the i-th row of R corresponds to restriction. In the unrestricted case, D is simply the p×p identity matrix. The formula for COVB is s2GDGT. The formula for the sequential sum of squares for the i-th ({i : rii > 0}) regression parameter is given by
Sequential sums of squares for {i : rii≤ 0} are set to zero.
For the restricted least-squares problem, the sequential and regression sums of squares correspond to those from a fitted reduced model obtained by first substituting the restriction Hβ = g into the model. In general, the reduced model is not unique. Care must be taken to interpret the sequential sums of squares in the context of the particular reduced model indicated by the R matrix. If g = 0, any of the reduced models that could be computed from the restrictions will produce the same regression sum of squares. However, if g≠ 0, different reduced models resulting from the same restricted model can have different regressands, and hence, different total and regression sums of squares.
Comments
When R is nonsingular and comes from an unrestricted regression fit, COVB is the estimated variance-covariance matrix of the estimated regression coefficients, and COVB = (SSE/DFE) * (RTR)−1. Otherwise, variances and covariances of estimable functions of the regression coefficients can be obtained using COVB, and COVB = (SSE/DFE) *GDGT. Here, D is the diagonal matrix with diagonal elements equal to 0 if the corresponding rows of R are restrictions and with diagonal elements equal to one otherwise. Also, G is a particular generalized inverse of R. See the Description section.
Examples
Example 1
This example uses a data set discussed by Draper and Smith (1981, pages 629‑630). This data set is put into the matrix X by routine GDATA (see Chapter 19, “Utilities”). There are four independent variables and one dependent variable. Routine RGIVN is invoked to fit the regression model and RSTAT is invoked to compute summary statistics.
* * * Variance-Covariance Matrix for the Coefficient Estimates * * *
1 2 3 4 5
1 4909.95 -50.51 -50.60 -51.66 -49.60
2 0.55 0.51 0.55 0.51
3 0.52 0.53 0.51
4 0.57 0.52
5 0.50
Example 2
A one-way analysis of covariance model is fitted to the turkey data discussed by Draper and Smith (1981, pages 243‑249). The response variable is turkey weight y (in pounds). Three groups of turkeys corresponding to the three states where they were reared are used. The age of a turkey (in weeks) is the covariate. The explanatory variables are age, group, and interaction. The model is
yij = μ + βxij + α i + βixij + ɛ ij i = 1, 2, 3; j = 1, 2, …, ni
where α3 = 0 and β3 = 0. Routine RGLM is used to fit the model with the option IDUMMY = 2. Then, RSTAT is used to compute summary statistics. The fitted model gives three separate lines with slopes 0.506, 0.470, and 0.445. The F test for interaction (the last effect) suggests omitting the interaction from the model and using a model with identical slopes for each group.
* * * Variance-Covariance Matrix for the Coefficient Estimates * * *
1 2 3 4 5
1 1.5965 -0.0631 -1.5965 -1.5965 0.0631
2 0.0025 0.0631 0.0631 -0.0025
3 2.3425 1.5965 -0.0913
4 16.8801 -0.0631
5 0.0036
6
1 0.0631
2 -0.0025
3 -0.0631
4 -0.6179
5 0.0025
6 0.0227
Example 3
A two-way analysis-of-variance model is fitted to balanced data discussed by Snedecor and Cochran (1967, Table 12.5.1, page 347). The responses are the weight gains (in grams) of rats fed diets varying in two components—level of protein and source of protein. The model is
yijk = μ + α i + βj + γij + ɛ ijk i = 1, 2; j = 1, 2, 3; k = 1, 2, …, 10
where
Routine RGLM is used to fit the model with the IDUMMY = 0 option. Then, RSTAT is used to compute summary statistics.