multivarNormalityTest¶
Computes Mardia’s multivariate measures of skewness and kurtosis and tests for multivariate normality.
Synopsis¶
multivarNormalityTest (x)
Required Arguments¶
- float
x[[]]
(Input) - Array of size
nObservations
bynVariables
containing the data.
Return Value¶
An array of dimension 13 containing output statistics
i |
stat[i] |
---|---|
0 | Estimated skewness. |
1 | Expected skewness assuming a multivariate normal distribution. |
2 | Asymptotic chi-squared statistic assuming a multivariate normal distribution. |
3 | Probability of a greater chi-squared. |
4 | Mardia and Foster’s standard normal score for skewness. |
5 | Estimated kurtosis. |
6 | Expected kurtosis assuming a multivariate normal distribution. |
7 | Asymptotic standard error of the estimated kurtosis. |
8 | Standard normal score obtained from stat[5] through
stat[7] . |
9 | p-value corresponding to stat[8] . |
10 | Mardia and Foster’s standard normal score for kurtosis. |
11 | Mardia’s \(S_W\) statistic based upon stat[4] and
stat[10] . |
12 | p-value for stat[11] . |
Optional Arguments¶
frequencies
, float[]
(Input)- Array of size
nObservations
containing the frequencies. Frequencies must be integer valued. Default assumes all frequencies equal one. weights
, float[]
(Input)- Array of size
nObservations
containing the weights. Weights must be greater than non-negative. Default assumes all weights equal one. sumFreq
(Output)- The sum of the frequencies of all observations used in the computations.
sumWeights
(Output)- The sum of the weights times the frequencies for all observations used in the computations.
nRowsMissing
(Output)- Number of rows of data in x
[]
containing any missing values (NaN). means
(Output)- An array of length
nVariables
containing the sample means. r
(Output)- An
nVariables
bynVariables
upper triangular matrix containing the Cholesky \(R^TR\) factorization of the covariance matrix.
Description¶
Function multivarNormalityTest
computes Mardia’s (1970) measures
\(b_{1,p}\) and \(b_{2,p}\) of multivariate skewness and kurtosis,
respectfully, for p = nVariables
. These measures are then used in
computing tests for multivariate normality. Three test statistics, one based
upon \(b_{1,p}\) alone, one based upon \(b_{2,p}\) alone, and an
omnibus test statistic formed by combining normal scores obtained from
\(b_{1,p}\) and \(b_{2,p}\) are computed. On the order of
\(np^3\), operations are required in computing \(b_{1,p}\) when the
method of Isogai (1983) is used, where n = nObservations
. On the order
of \(np^2\), operations are required in computing \(b_{2,p}\).
Let
where
\(f_i\) is the frequency of the i-th observation, and \(w_i\) is the weight for this observation. (Weights \(w_i\) are defined such that \(x_i\) is distributed according to a multivariate normal, \(N(\mu, \Sigma/w_i)\) distribution, where Σ is the covariance matrix.) Mardia’s multivariate skewness statistic is defined as:
while Mardia’s kurtosis is given as:
Both measures are invariant under the affine (matrix) transformation AX +
D, and reduce to the univariate measures when p = nVariables
= 1.
Using formulas given in Mardia and Foster (1983), the approximate expected
value, asymptotic standard error, and asymptotic p‑value for
\(b_{2,p}\), and the approximate expected value, an asymptotic
chi-squared statistic, and p‑value for the \(b_{1,p}\) statistic are
computed. These statistics are all computed under the null hypothesis of a
multivariate normal distribution. In addition, standard normal scores
\(W_1(b_{1,p})\) and \(W_2(b_{2,p})\) (different from but similar to
the asymptotic normal and chi-squared statistics above) are computed. These
scores are combined into an asymptotic chi-squared statistic with two degrees
of freedom:
This chi-squared statistic may be used to test for multivariate normality. A p‑value for the chi-squared statistic is also computed.
Example¶
In this example, 150 observations from a 5 dimensional standard normal distribution are generated via routine randomNormal (Chapter 12, Random Number Generation). The skewness and kurtosis statistics are then computed for these observations.
from __future__ import print_function
from numpy import *
from pyimsl.stat.multivarNormalityTest import multivarNormalityTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomNormal import randomNormal
from pyimsl.stat.writeMatrix import writeMatrix
nobs = 150
ncol = 5
nvar = 5
izero = 0
randomSeedSet(123457)
x = randomNormal(nobs * nvar)
x = x.reshape(nobs, nvar)
ni = []
swt = []
r = []
nrmiss = []
xmean = []
stats = multivarNormalityTest(x, sumFreq=ni, sumWeights=swt,
nRowsMissing=nrmiss, r=r, means=xmean)
print("Sum of frequencies = %d" % (ni[0]))
print("Sum of the weights =%8.3f" % (swt[0]))
print("Number rows missing = %3i" % (nrmiss[0]))
writeMatrix("stat", stats, rowNumberZero=True, column=True)
writeMatrix("means", xmean, writeFormat="%9.6f")
writeMatrix("R", r, writeFormat="%8.3f")
Output¶
Sum of frequencies = 150
Sum of the weights = 150.000
Number rows missing = 0
stat
0 0.73
1 1.36
2 18.62
3 0.99
4 -2.37
5 32.67
6 34.54
7 1.27
8 -1.48
9 0.14
10 1.62
11 8.24
12 0.02
means
1 2 3 4 5
0.026232 0.092377 0.065362 0.098194 0.056391
R
1 2 3 4 5
1 1.033 -0.084 -0.065 0.108 0.067
2 0.000 1.049 -0.097 -0.042 -0.021
3 0.000 0.000 1.063 0.006 -0.145
4 0.000 0.000 0.000 0.942 -0.084
5 0.000 0.000 0.000 0.000 0.949