hypothesis

Chapter 2: Regression

hypothesis_test

Performs tests for a multivariate general linear hypothesis HβU = G given the hypothesis sums of squares and crossproducts matrix S_H.

Synopsis

#include <imsls.h>

float imsls_f_hypothesis_test (Imsls_f_regression *regression_info, float dfh, float *scph, ..., 0)

The type double function is imsls_d_hypothesis_test.

Required Argument

Imsls_f_regression *regression_info (Input)
Pointer to a structure of type Imsls_f_regression containing information about the regression fit. See function imsls_f_regression.

float dfh (Input)
Degrees of freedom for the sums of squares and crossproducts matrix.

float *scph (Input)
Array of size nu by nu containing S_H, the sums of squares and crossproducts attributable to the hypothesis.

Return Value

The p-value corresponding to Wilks' lambda test.

Synopsis with Optional Arguments

#include <imsls.h>

float imsls_f_hypothesis_test (Imsls_f_regression *regression_info, float dfh, float *scph,
IMSLS_U, int nu, float u[],
IMSLS_WILK_LAMBDA, float *value, float *p_value,
IMSLS_ROY_MAX_ROOT, float *value, float *p_value,
IMSLS_HOTELLING_TRACE, float *value, float *p_value,
IMSLS_PILLAI_TRACE, float *value, float *p_value,
0)

Optional Arguments

IMSLS_U, int nu, float u[] (Input)
Argument nu is the number of linear combinations of the dependent variables to be considered. The value nu must be greater than 0 and less than or equal to n_dependent. Argument u contains the n_dependent by nu U matrix for the test H_pβU = G_p.
Default: nu = n_dependent and u is the identity matrix

IMSLS_WILK_LAMBDA, float *value, float *p_value (Output)
Wilk's lamda and p-value.

IMSLS_ROY_MAX_ROOT, float *value, float *p_value (Output)
Roy's maximum root criterion and p-value.

IMSLS_HOTELLING_TRACE, float *value, float *p_value (Output)
Hotelling's trace and p-value.

IMSLS_PILLAI_TRACE, float *value, float *p_value (Output)
Pillai's trace and p-value.

Description

Function imsls_f_hypothesis_test computes test statistics and p-values for the general linear hypothesis HβU = G for the multivariate general linear model.

The hypothesis sum of squares and crossproducts matrix input in scph is

where C is a solution to R^TC = H and where D is a diagonal matrix with diagonal elements

For a detailed discussion, see “Linear Dependence and the R Matrix.”

The error sum of squares and crossproducts matrix for the model Y = Xβ + ɛ is

which is input in regression_info. The error sum of squares and crossproducts matrix for the hypothesis HβU = G computed by imsls_f_hypothesis_test is

Let p equal the order of the matrices S_E and S_H, i.e.,

Let q (stored in dfh) be the degrees of freedom for the hypothesis. Let v (input in regression_info) be the degrees of freedom for error. Function imsls_f_hypothesis_test computed three test statistics based on eigenvalues λ_i (i = 1, 2, …, p) of the generalized eigenvalue problem S_Hx = λS_Ex. These test statistics are as follows:

Wilk's lambda

The associated p-value is based on an approximation discussed by Rao (1973, p. 556). The statistic

has an approximate F distribution with pq and ms − pq ∕ 2 + 1 numerator and denominator degrees of freedom, respectively, where

and

The F test is exact if min (p, q) ≤ 2 (Kshirsagar, 1972, Theorem 4, p. 299−300).

Roy's maximum root

c = max λ_i over all i

where c is output as value. The p-value is based on the approximation

where s = max (p, q) has an approximate F distribution with s and ν + q − s numerator and denominator degrees of freedom, respectively. The F test is exact if s = 1; the p-value is also exact. In general, the value output in p_value is lower bound on the actual p-value.

Hotelling's trace

U is output as value. The p-value is based on the approximation of McKeon (1974) that supersedes the approximation of Hughes and Saw (1972). McKeon's approximation is also discussed by Seber (1984, p. 39). For

the p-value is based on the result that

has an approximate F distribution with pq and b degrees of freedom. The test is exact if min (p, q) = 1. For ν ≤ p + 1, the approximation is not valid, and p_value is set to NaN.

These three test statistics are valid when S_E is positive definite. A necessary condition for S_E to be positive definite is ν ≥ p. If S_E is not positive definite, a warning error message is issued, and both value and p_value are set to NaN.

Because the requirement ν ≥ p can be a serious drawback, imsls_f_hypothesis_test computes a fourth test statistic based on eigenvalues θ_i (i = 1, 2, …, p) of the generalized eigenvalue problem S_Hw = θ(S_H + S_E) w. This test statistic requires a less restrictive assumption—S_H + S_E is positive definite. A necessary condition for S_H + S_E to be positive definite is ν + q ≥ p. If S_E is positive definite, imsls_f_hypothesis_test avoids the computation of the generalized eigenvalue problem from scratch. In this case, the eigenvalues θ_i are obtained from λ_i by

The fourth test statistic is as follows:

Pillai's trace

V is output as value. The p-value is based on an approximation discussed by Pillai (1985). The statistic

has an approximate F distribution with s(2m + s + 1) and s(2n + s + 1) numerator and denominator degrees of freedom, respectively, where

s = min (p, q)

m = ½(|p − q| −1)

n = ½(ν − p − 1)

The F test is exact if min (p, q) = 1.

Examples

Example 1

The data for this example are from Maindonald (1984, p. 203−204). A multivariate regression model containing two dependent variables and three independent variables is fit using function imsls_f_regression and the results stored in the structure regression_info. The sum of squares and crossproducts matrix, scph, is then computed with a call to imsls_f_hypothesis_scph for the test that the third independent variable is in the model (determined by specification of h). Finally, function imsls_f_hypothesis_test is called to compute the p-value for the test statistic (Wilk's lambda).

#include <imsls.h>

int main()

{

Imsls_f_regression *info;

float *coefficients, *scph;

float dfh, p_value;

float x[] = { 7.0, 5.0, 6.0,

2.0,-1.0, 6.0,

7.0, 3.0, 5.0,

-3.0, 1.0, 4.0,

2.0,-1.0, 0.0,

2.0, 1.0, 7.0,

-3.0,-1.0, 3.0,

2.0, 1.0, 1.0,

2.0, 1.0, 4.0 };

float y[] = { 7.0, 1.0,

-5.0, 4.0,

6.0, 10.0,

5.0, 5.0,

5.0, -2.0,

-2.0, 4.0,

0.0, -6.0,

8.0, 2.0,

3.0, 0.0 };

int n_observations = 9;

int n_independent = 3;

int n_dependent = 2;

int nh = 1;

float h[] = { 0, 0, 0, 1 };

coefficients = imsls_f_regression(n_observations, n_independent,

x, y,

IMSLS_N_DEPENDENT, n_dependent,

IMSLS_REGRESSION_INFO, &info,

0);

scph = imsls_f_hypothesis_scph(info, nh, h, &dfh, 0);

p_value = imsls_f_hypothesis_test(info, dfh, scph, 0);

printf("P-value = %10.6f\n", p_value);

}

Output

P-value = 0.000010

Example 2

This example is the same as the first example, but more statistics are computed. Also, the U matrix, u, is explicitly specified as the identity matrix (which is the same default configuration of U).

#include <imsls.h>

int main()

{

Imsls_f_regression *info;

float *coefficients, *scph;

float dfh, p_value;

float x[] = { 7.0, 5.0, 6.0,

2.0,-1.0, 6.0,

7.0, 3.0, 5.0,

-3.0, 1.0, 4.0,

2.0,-1.0, 0.0,

2.0, 1.0, 7.0,

-3.0,-1.0, 3.0,

2.0, 1.0, 1.0,

2.0, 1.0, 4.0 };

float y[] = { 7.0, 1.0,

-5.0, 4.0,

6.0, 10.0,

5.0, 5.0,

5.0, -2.0,

-2.0, 4.0,

0.0, -6.0,

8.0, 2.0,

3.0, 0.0 };

int n_observations = 9;

int n_independent = 3;

int n_dependent = 2;

int nh = 1;

float h[] = { 0, 0, 0, 1 };

int nu = 2;

float u[4]={1, 0, 0, 1};

float v1, v2, v3, v4, p1, p2, p3, p4;

coefficients = imsls_f_regression(n_observations, n_independent,

x, y,

IMSLS_N_DEPENDENT, n_dependent,

IMSLS_REGRESSION_INFO, &info,

0);

scph = imsls_f_hypothesis_scph(info, nh, h, &dfh, 0);

p_value = imsls_f_hypothesis_test(info, dfh, scph,

IMSLS_U, nu, u,

IMSLS_WILK_LAMBDA, &v1, &p1,

IMSLS_ROY_MAX_ROOT, &v2, &p2,

IMSLS_HOTELLING_TRACE, &v3, &p3,

IMSLS_PILLAI_TRACE, &v4, &p4,

0);

printf("Wilk value = %10.6f p-value = %10.6f\n", v1, p1);

printf("Roy value = %10.6f p-value = %10.6f\n", v2, p2);

printf("Hotelling value = %10.6f p-value = %10.6f\n", v3, p3);

printf("Pillai value = %10.6f p-value = %10.6f\n", v4, p4);

}

Output

Wilk value = 0.003149 p-value = 0.000010

Roy value = 316.600861 p-value = 0.000010

Hotelling value = 316.600861 p-value = 0.000010

Pillai value = 0.996851 p-value = 0.000010

Warning Errors

IMSLS_SINGULAR_1 “u”*“scpe”*“u” is singular. Only Pillai's trace can be computed. Other statistics are set to NaN.

Fatal Errors

IMSLS_NO_STAT_1 “scpe” + “scph” is singular. No tests can be computed.

IMSLS_NO_STAT_2 No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph”*x = (lambda)*(“scph”+“scpe”)*x failed to converge.

IMSLS_NO_STAT_3 No statistics can be computed. Iterations
for eigenvalues for the generalized
eigenvalue problem “scph”
*x = (lambda)*(“scph”+“u”*“scpe”*“u”)*x failed to converge.

IMSLS_SINGULAR_2 “u”*“scpe”*“u” + “scph” is singular. No tests can be computed.

IMSLS_SINGULAR_TRI_MATRIX The input triangular matrix is singular. The index of the first zero diagonal element is equal to #.

Visual Numerics, Inc.
Visual Numerics - Developers of IMSL and PV-WAVE
http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260