hypothesisTest¶
Performs tests for a multivariate general linear hypothesis \(H\beta U=G\) given the hypothesis sums of squares and crossproducts matrix \(S_H\).
Synopsis¶
hypothesisTest (regressionInfo, dfh, scph)
Required Argument¶
- structure
regressionInfo
(Input) - A structure containing information about the regression fit. See function regression.
- float
dfh
(Input) - Degrees of freedom for the sums of squares and crossproducts matrix.
- float
scph[[]]
(Input) - Array of size
nu
bynu
containing \(S_H\), the sums of squares and crossproducts attributable to the hypothesis.
Return Value¶
The p-value corresponding to Wilks’ lambda test.
Optional Arguments¶
u
, float[]
(Input)Argument
u
contains the nDependent bynu
U matrix for the test \(H_p\beta U=G_p\); wherenu
is the number of linear combinations of the dependent variables to be considered. The valuenu
must be greater than 0 and less than or equal to nDependent.Default:
nu
= nDependent andu
is the identity matrixwilkLambda
,value
,pValue
(Output)- Wilk’s lamda and p-value.
royMaxRoot
,value
,pValue
(Output)- Roy’s maximum root criterion and p-value.
hotellingTrace
,value
,pValue
(Output)- Hotelling’s trace and p-value.
pillaiTrace
,value
,pValue
(Output)- Pillai’s trace and p-value.
Description¶
Function hypothesisTest
computes test statistics and p-values for the
general linear hypothesis \(H\beta U=G\) for the multivariate general
linear model.
The hypothesis sum of squares and crossproducts matrix input in scph
is
where C is a solution to \(R^TC=H,(C^TDC)^-\) denotes the generalized inverse of \(C^TDC\), and D is a diagonal matrix with diagonal elements
For a detailed discussion, see Linear Dependence and the R Matrix in the Usage Notes.
The error sum of squares and crossproducts matrix for the model \(Y=X\beta+\varepsilon\) is
which is input in regressionInfo
. The error sum of squares and
crossproducts matrix for the hypothesis \(H\beta U=G\) computed by
hypothesisTest
is
Let p equal the order of the matrices \(S_E\) and \(S_H\), i.e.,
Let q (stored in dfh
) be the degrees of freedom for the hypothesis. Let
v (input in regressionInfo
) be the degrees of freedom for error.
Function hypothesisTest
computed three test statistics based on
eigenvalues \(\lambda_i\) (\(i=1,2,\ldots,p\)) of the generalized
eigenvalue problem \(S_Hx=\lambda S_Ex\). These test statistics are as
follows:
Wilk’s lambda
The associated p-value is based on an approximation discussed by Rao (1973, p. 556). The statistic
has an approximate F distribution with pq and \(ms-pq/2+1\) numerator and denominator degrees of freedom, respectively, where
and
The F test is exact if \(\min(p,q)\leq 2\) (Kshirsagar, 1972, Theorem 4, p. 299−300).
Roy’s maximum root
where c is output as value
. The p-value is based on the
approximation
where \(s=\max(p,q)\) has an approximate F distribution with s and
\(ν + q − s\) numerator and denominator degrees of freedom, respectively.
The F test is exact if \(s=1\); the p-value is also exact. In
general, the value output in pValue
is lower bound on the actual
p-value.
Hotelling’s trace
U is output as value
. The p-value is based on the approximation of
McKeon (1974) that supersedes the approximation of Hughes and Saw (1972).
McKeon’s approximation is also discussed by Seber (1984, p. 39). For
the p-value is based on the result that
has an approximate F distribution with pq and b degrees of freedom. The
test is exact if \(\min(p,q)=1\). For \(v\leq p+1\), the
approximation is not valid, and pValue
is set to NaN.
These three test statistics are valid when \(S_E\) is positive definite.
A necessary condition for \(S_E\) to be positive definite is
\(\nu\geq p\). If \(S_E\) is not positive definite, a warning error
message is issued, and both value
and pValue
are set to NaN.
Because the requirement ν ≥ p can be a serious drawback, hypothesisTest
computes a fourth test statistic based on eigenvalues \(\theta_i\)
(\(i= 1,2,\ldots,p\)) of the generalized eigenvalue problem
\(S_Hw=\theta(S_H+ S_E)w\). This test statistic requires a less
restrictive assumption— \(S_H+S_E\) is positive definite. A necessary
condition for \(S_H +S_E\) to be positive definite is \(\nu+q\geq
p\). If \(S_E\) is positive definite, hypothesisTest
avoids the
computation of the generalized eigenvalue problem from scratch. In this case,
the eigenvalues \(\theta_i\) are obtained from \(\lambda_i\) by
The fourth test statistic is as follows:
Pillai’s trace
V is output as value
. The p-value is based on an approximation
discussed by Pillai (1985). The statistic
has an approximate F distribution with \(s(2m+s+1)\) and \(s(2n+s+1)\) numerator and denominator degrees of freedom, respectively, where
The F test is exact if \(\min(p,q)=1\).
Examples¶
Example 1¶
The data for this example are from Maindonald (1984, p. 203−204). A
multivariate regression model containing two dependent variables and three
independent variables is fit using function
regression and the results stored in the structure
regressionInfo
. The sum of squares and crossproducts matrix, scph
,
is then computed with a call to hypothesisScph for
the test that the third independent variable is in the model (determined by
specification of h
). Finally, function hypothesisTest
is called to
compute the p-value for the test statistic (Wilk’s lambda).
from __future__ import print_function
from numpy import *
from pyimsl.stat.hypothesisScph import hypothesisScph
from pyimsl.stat.regression import regression
from pyimsl.stat.hypothesisTest import hypothesisTest
x = array([
[7.0, 5.0, 6.0],
[2.0, -1.0, 6.0],
[7.0, 3.0, 5.0],
[-3.0, 1.0, 4.0],
[2.0, -1.0, 0.0],
[2.0, 1.0, 7.0],
[-3.0, -1.0, 3.0],
[2.0, 1.0, 1.0],
[2.0, 1.0, 4.0]])
y = array([
[7.0, 1.0],
[-5.0, 4.0],
[6.0, 10.0],
[5.0, 5.0],
[5.0, -2.0],
[-2.0, 4.0],
[0.0, -6.0],
[8.0, 2.0],
[3.0, 0.0]])
n_observations = 9
n_independent = 3
n_dependent = 2
nh = 1
h = array([[0, 0, 0, 1]])
info = []
dfh = []
coefficients = regression(x, y,
nDependent=n_dependent,
regressionInfo=info)
scph = hypothesisScph(info[0], h, dfh)
p_value = hypothesisTest(info[0], dfh[0], scph)
print("P-value = %10.6f" % p_value)
Output¶
P-value = 0.000010
Example 2¶
This example is the same as the first example, but more statistics are
computed. Also, the U matrix, u
, is explicitly specified as the
identity matrix (which is the same default configuration of U).
from __future__ import print_function
from numpy import *
from pyimsl.stat.hypothesisScph import hypothesisScph
from pyimsl.stat.hypothesisTest import hypothesisTest
from pyimsl.stat.regression import regression
x = array([
[7.0, 5.0, 6.0],
[2.0, -1.0, 6.0],
[7.0, 3.0, 5.0],
[-3.0, 1.0, 4.0],
[2.0, -1.0, 0.0],
[2.0, 1.0, 7.0],
[-3.0, -1.0, 3.0],
[2.0, 1.0, 1.0],
[2.0, 1.0, 4.0]])
y = array([
[7.0, 1.0],
[-5.0, 4.0],
[6.0, 10.0],
[5.0, 5.0],
[5.0, -2.0],
[-2.0, 4.0],
[0.0, -6.0],
[8.0, 2.0],
[3.0, 0.0]])
n_observations = 9
n_independent = 3
n_dependent = 2
nh = 1
h = array([[0, 0, 0, 1]])
nu = 2
u = [[1, 0], [0, 1]]
info = []
dfh = []
coefficients = regression(x, y,
nDependent=n_dependent,
regressionInfo=info)
scph = hypothesisScph(info[0], h, dfh)
wilk_lambda = {}
roy_max_root = {}
hotelling_trace = {}
pillai_trace = {}
p_value = hypothesisTest(info[0], dfh[0], scph,
u=u, wilkLambda=wilk_lambda,
royMaxRoot=roy_max_root,
hotellingTrace=hotelling_trace,
pillaiTrace=pillai_trace)
v1 = wilk_lambda["value"]
p1 = wilk_lambda["pValue"]
v2 = roy_max_root["value"]
p2 = roy_max_root["pValue"]
v3 = hotelling_trace["value"]
p3 = hotelling_trace["pValue"]
v4 = pillai_trace["value"]
p4 = pillai_trace["pValue"]
print("Wilk value = %10.6f p-value = %10.6f" % (v1, p1))
print("Roy value = %10.6f p-value = %10.6f" % (v2, p2))
print("Hotelling value = %10.6f p-value = %10.6f" % (v3, p3))
print("Pillai value = %10.6f p-value = %10.6f" % (v4, p4))
Output¶
Wilk value = 0.003149 p-value = 0.000010
Roy value = 316.600000 p-value = 0.000010
Hotelling value = 316.600000 p-value = 0.000010
Pillai value = 0.996851 p-value = 0.000010
Warning Errors¶
IMSLS_SINGULAR_1 |
“u”“scpe”“u” is singular. Only Pillai’s trace can be computed. Other statistics are set to NaN. |
Fatal Errors¶
IMSLS_NO_STAT_1 |
“scpe” + “scph” is singular. No tests can be computed. |
IMSLS_NO_STAT_2 |
No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph”x = (lambda)(“scph”+“scpe”)x failed to converge. |
IMSLS_NO_STAT_3 |
No statistics can be computed. Iterations for eigenvalues for the generalized eigenvalue problem “scph” x = (lambda)(“scph”+“u”“scpe”“u”)x failed to converge. |
IMSLS_SINGULAR_2 |
“u”“scpe”“u” + “scph” is singular. No tests can be computed. |
IMSLS_SINGULAR_TRI_MATRIX |
The input triangular matrix is singular. The index of the first zero diagonal element is equal to #. |