CNL Stat : Regression : hypothesis_test
hypothesis_test
Performs tests for a multivariate general linear hypothesis HβG given the hypothesis sums of squares and crossproducts matrix SH.
Synopsis
#include <imsls.h>
float imsls_f_c (Imsls_f_regression *regression_info, float dfh, float *scph, ..., 0)
The type double function is imsls_d_hypothesis_test.
Required Argument
Imsls_f_regression *regression_info (Input)
Pointer to a structure of type Imsls_f_regression containing information about the regression fit. See function imsls_f_regression.
float dfh (Input)
Degrees of freedom for the sums of squares and crossproducts matrix.
float *scph (Input)
Array of size nu by nu containing SH, the sums of squares and crossproducts attributable to the hypothesis.
Return Value
The p-value corresponding to Wilks’ lambda test.
Synopsis with Optional Arguments
#include <imsls.h>
float imsls_f_hypothesis_test (Imsls_f_regression *regression_info, float dfh, float *scph,
IMSLS_U, int nu, float u[],
IMSLS_WILK_LAMBDA, float *value, float *p_value,
IMSLS_ROY_MAX_ROOT, float *value, float *p_value,
IMSLS_HOTELLING_TRACE, float *value, float *p_value,
IMSLS_PILLAI_TRACE, float *value, float *p_value,
0)
Optional Arguments
IMSLS_U, int nu, float u[] (Input)
Argument nu is the number of linear combinations of the dependent variables to be considered. The value nu must be greater than 0 and less than or equal to n_dependent. Argument u contains the n_dependent by nu U matrix for the test HpβGp.
Default: nun_dependent and u is the identity matrix
IMSLS_WILK_LAMBDA, float *value, float *p_value (Output)
Wilk’s lamda and p-value.
IMSLS_ROY_MAX_ROOT, float *value, float *p_value (Output)
Roy’s maximum root criterion and p-value.
IMSLS_HOTELLING_TRACE, float *value, float *p_value (Output)
Hotelling’s trace and p-value.
IMSLS_PILLAI_TRACE, float *value, float *p_value (Output)
Pillai’s trace and p-value.
Description
Function imsls_f_hypothesis_test computes test statistics and p-values for the general linear hypothesis HβG for the multivariate general linear model.
The hypothesis sum of squares and crossproducts matrix input in scph is
where C is a solution to RTH,(CTDC)- denotes the generalized inverse of CTDC, and D is a diagonal matrix with diagonal elements
For a detailed discussion, see Linear Dependence and the R Matrix in the Usage Notes.
The error sum of squares and crossproducts matrix for the model Xβ + ɛ is
which is input in regression_info. The error sum of squares and crossproducts matrix for the hypothesis HβG computed by imsls_f_hypothesis_test is
Let p equal the order of the matrices SE and SH, i.e.,
Let q (stored in dfh) be the degrees of freedom for the hypothesis. Let v (input in regression_info) be the degrees of freedom for error. Function imsls_f_hypothesis_test computed three test statistics based on eigenvalues λi (= 1, 2, , p) of the generalized eigenvalue problem SHλSEx. These test statistics are as follows:
Wilk’s lambda
The associated p-value is based on an approximation discussed by Rao (1973, p. 556). The statistic
has an approximate F distribution with pq and ms  pq  2 + 1 numerator and denominator degrees of freedom, respectively, where
and
The F test is exact if min (p, q)  2 (Kshirsagar, 1972, Theorem 4, p. 299300).
Roy’s maximum root
= max λi      over all i
where c is output as value. The p-value is based on the approximation
where = max (p, q) has an approximate F distribution with s and ν +  s numerator and denominator degrees of freedom, respectively. The F test is exact if s = 1; the p-value is also exact. In general, the value output in p_value is lower bound on the actual p-value.
Hotelling’s trace
U is output as value. The p-value is based on the approximation of McKeon (1974) that supersedes the approximation of Hughes and Saw (1972). McKeon’s approximation is also discussed by Seber (1984, p. 39). For
the p-value is based on the result that
has an approximate F distribution with pq and b degrees of freedom. The test is exact if min (pq) = 1. For ν  + 1, the approximation is not valid, and p_value is set to NaN.
These three test statistics are valid when SE is positive definite. A necessary condition for SE to be positive definite is ν  p. If SE is not positive definite, a warning error message is issued, and both value and p_value are set to NaN.
Because the requirement ν  p can be a serious drawback, imsls_f_hypothesis_test computes a fourth test statistic based on eigenvalues θi (= 1, 2, , p) of the generalized eigenvalue problem SHθ(SH + SE) w. This test statistic requires a less restrictive assumption—SH + SE is positive definite. A necessary condition for SH + SE to be positive definite is ν +  p. If SE is positive definite, imsls_f_hypothesis_test avoids the computation of the generalized eigenvalue problem from scratch. In this case, the eigenvalues θi are obtained from λi by
The fourth test statistic is as follows:
Pillai’s trace
V is output as value. The p-value is based on an approximation discussed by Pillai (1985). The statistic
has an approximate F distribution with s(2+ 1) and s(2+ 1) numerator and denominator degrees of freedom, respectively, where
s = min (p, q)
m = ½(|p q| 1)
n = ½(ν p 1)
The F test is exact if min (p, q) = 1.
Examples
Example 1
The data for this example are from Maindonald (1984, p. 203204). A multivariate regression model containing two dependent variables and three independent variables is fit using function imsls_f_regression and the results stored in the structure regression_info. The sum of squares and crossproducts matrix, scph, is then computed with a call to imsls_f_hypothesis_scph for the test that the third independent variable is in the model (determined by specification of h). Finally, function imsls_f_hypothesis_test is called to compute the p-value for the test statistic (Wilk’s lambda).
 
#include <imsls.h>
#include <stdio.h>
 
int main()
{
Imsls_f_regression *info;
float *coefficients, *scph;
float dfh, p_value;
 
float x[] = {
7.0, 5.0, 6.0,
2.0,-1.0, 6.0,
7.0, 3.0, 5.0,
-3.0, 1.0, 4.0,
2.0,-1.0, 0.0,
2.0, 1.0, 7.0,
-3.0,-1.0, 3.0,
2.0, 1.0, 1.0,
2.0, 1.0, 4.0
};
 
float y[] = {
7.0, 1.0,
-5.0, 4.0,
6.0, 10.0,
5.0, 5.0,
5.0, -2.0,
-2.0, 4.0,
0.0, -6.0,
8.0, 2.0,
3.0, 0.0
};
 
int n_observations = 9;
int n_independent = 3;
int n_dependent = 2;
int nh = 1;
float h[] = {0, 0, 0, 1};
 
coefficients = imsls_f_regression(n_observations, n_independent,
x, y,
IMSLS_N_DEPENDENT, n_dependent,
IMSLS_REGRESSION_INFO, &info,
0);
 
scph = imsls_f_hypothesis_scph(info, nh, h, &dfh,
0);
 
p_value = imsls_f_hypothesis_test(info, dfh, scph,
0);
printf("P-value = %10.6f\n", p_value);
}
Output
 
P-value = 0.000010
Example 2
This example is the same as the first example, but more statistics are computed. Also, the U matrix, u, is explicitly specified as the identity matrix (which is the same default configuration of U).
 
#include <imsls.h>
#include <stdio.h>
 
int main()
{
Imsls_f_regression *info;
float *coefficients, *scph;
float dfh, p_value;
 
float x[] = {
7.0, 5.0, 6.0,
2.0,-1.0, 6.0,
7.0, 3.0, 5.0,
-3.0, 1.0, 4.0,
2.0,-1.0, 0.0,
2.0, 1.0, 7.0,
-3.0,-1.0, 3.0,
2.0, 1.0, 1.0,
2.0, 1.0, 4.0
};
 
float y[] ={
7.0, 1.0,
-5.0, 4.0,
6.0, 10.0,
5.0, 5.0,
5.0, -2.0,
-2.0, 4.0,
0.0, -6.0,
8.0, 2.0,
3.0, 0.0
};
 
int n_observations = 9;
int n_independent = 3;
int n_dependent = 2;
int nh = 1;
float h[] = { 0, 0, 0, 1 };
int nu = 2;
float u[4]={1, 0, 0, 1};
float v1, v2, v3, v4, p1, p2, p3, p4;
 
coefficients = imsls_f_regression(n_observations, n_independent,
x, y,
IMSLS_N_DEPENDENT, n_dependent,
IMSLS_REGRESSION_INFO, &info,
0);
 
scph = imsls_f_hypothesis_scph(info, nh, h, &dfh,
0);
 
p_value = imsls_f_hypothesis_test(info, dfh, scph,
IMSLS_U, nu, u,
IMSLS_WILK_LAMBDA, &v1, &p1,
IMSLS_ROY_MAX_ROOT, &v2, &p2,
IMSLS_HOTELLING_TRACE, &v3, &p3,
IMSLS_PILLAI_TRACE, &v4, &p4,
0);
 
printf("Wilk value = %10.6f p-value = %10.6f\n", v1, p1);
printf("Roy value = %10.6f p-value = %10.6f\n", v2, p2);
printf("Hotelling value = %10.6f p-value = %10.6f\n", v3, p3);
printf("Pillai value = %10.6f p-value = %10.6f\n", v4, p4);
}
Output
 
Wilk value = 0.003149 p-value = 0.000010
Roy value = 316.600861 p-value = 0.000010
Hotelling value = 316.600861 p-value = 0.000010
Pillai value = 0.996851 p-value = 0.000010
Warning Errors
IMSLS_SINGULAR_1
“u”*“scpe”*“u” is singular. Only Pillai’s trace can be computed. Other statistics are set to NaN.
Fatal Errors
IMSLS_NO_STAT_1
“scpe” + “scph” is singular. No tests can be computed.
IMSLS_NO_STAT_2
No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph”*x = (lambda)*(“scph”+“scpe”)*x failed to converge.
IMSLS_NO_STAT_3
No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph” *x = (lambda)*(“scph”+“u”*“scpe”*“u”)*x failed to converge.
IMSLS_SINGULAR_2
“u”*“scpe”*“u” + “scph” is singular. No tests can be computed.
IMSLS_SINGULAR_TRI_MATRIX
The input triangular matrix is singular. The index of the first zero diagonal element is equal to #.