multivarNormalityTest

Computes Mardia’s multivariate measures of skewness and kurtosis and tests for multivariate normality.

Synopsis

multivarNormalityTest (x)

Required Arguments

float x[[]] (Input)
Array of size nObservations by nVariables containing the data.

Return Value

An array of dimension 13 containing output statistics

i stat[i]
0 Estimated skewness.
1 Expected skewness assuming a multivariate normal distribution.
2 Asymptotic chi-squared statistic assuming a multivariate normal distribution.
3 Probability of a greater chi-squared.
4 Mardia and Foster’s standard normal score for skewness.
5 Estimated kurtosis.
6 Expected kurtosis assuming a multivariate normal distribution.
7 Asymptotic standard error of the estimated kurtosis.
8 Standard normal score obtained from stat[5] through stat[7].
9 p-value corresponding to stat[8].
10 Mardia and Foster’s standard normal score for kurtosis.
11 Mardia’s \(S_W\) statistic based upon stat[4] and stat[10].
12 p-value for stat[11].

Optional Arguments

frequencies, float[] (Input)
Array of size nObservations containing the frequencies. Frequencies must be integer valued. Default assumes all frequencies equal one.
weights, float[] (Input)
Array of size nObservations containing the weights. Weights must be greater than non-negative. Default assumes all weights equal one.
sumFreq (Output)
The sum of the frequencies of all observations used in the computations.
sumWeights (Output)
The sum of the weights times the frequencies for all observations used in the computations.
nRowsMissing (Output)
Number of rows of data in x[] containing any missing values (NaN).
means (Output)
An array of length nVariables containing the sample means.
r (Output)
An nVariables by nVariables upper triangular matrix containing the Cholesky \(R^TR\) factorization of the covariance matrix.

Description

Function multivarNormalityTest computes Mardia’s (1970) measures \(b_{1,p}\) and \(b_{2,p}\) of multivariate skewness and kurtosis, respectfully, for p = nVariables. These measures are then used in computing tests for multivariate normality. Three test statistics, one based upon \(b_{1,p}\) alone, one based upon \(b_{2,p}\) alone, and an omnibus test statistic formed by combining normal scores obtained from \(b_{1,p}\) and \(b_{2,p}\) are computed. On the order of \(np^3\), operations are required in computing \(b_{1,p}\) when the method of Isogai (1983) is used, where n = nObservations. On the order of \(np^2\), operations are required in computing \(b_{2,p}\).

Let

\[d_{ij} = \sqrt{w_i w_j} \left(x_i - \overline{x}\right)^T S^{-1} \left(x_j - \overline{x}\right)\]

where

\[\begin{split}\begin{array}{l} S = \frac {\displaystyle\sum_{i=1}^{n} w_i f_i \left(x_i - \overline{x}\right) \left(x_i - \overline{x}\right)^T} {\displaystyle\sum_{i=1}^{n} f_i} \\ \overline{x} = \frac{1}{\displaystyle\sum_{i=1}^{n} w_i f_i} \displaystyle\sum_{i=1}^{n} w_i f_i x_i \end{array}\end{split}\]

\(f_i\) is the frequency of the i-th observation, and \(w_i\) is the weight for this observation. (Weights \(w_i\) are defined such that \(x_i\) is distributed according to a multivariate normal, \(N(\mu, \Sigma/w_i)\) distribution, where Σ is the covariance matrix.) Mardia’s multivariate skewness statistic is defined as:

\[b_{1,p} = \frac{1}{n^2} \sum_{i=1}^{n} \sum_{j=1}^{n} f_i f_j d_{ij}^3\]

while Mardia’s kurtosis is given as:

\[b_{2,p} = \frac{1}{n} \sum_{i=1}^{n} f_i d_{ii}^2\]

Both measures are invariant under the affine (matrix) transformation AX + D, and reduce to the univariate measures when p = nVariables = 1. Using formulas given in Mardia and Foster (1983), the approximate expected value, asymptotic standard error, and asymptotic p‑value for \(b_{2,p}\), and the approximate expected value, an asymptotic chi-squared statistic, and p‑value for the \(b_{1,p}\) statistic are computed. These statistics are all computed under the null hypothesis of a multivariate normal distribution. In addition, standard normal scores \(W_1(b_{1,p})\) and \(W_2(b_{2,p})\) (different from but similar to the asymptotic normal and chi-squared statistics above) are computed. These scores are combined into an asymptotic chi-squared statistic with two degrees of freedom:

\[S_W = W_1^2\left(b_{1,p}\right) + W_2^2 \left(b_{2,p}\right)\]

This chi-squared statistic may be used to test for multivariate normality. A p‑value for the chi-squared statistic is also computed.

Example

In this example, 150 observations from a 5 dimensional standard normal distribution are generated via routine randomNormal (Chapter 12, Random Number Generation). The skewness and kurtosis statistics are then computed for these observations.

from __future__ import print_function
from numpy import *
from pyimsl.stat.multivarNormalityTest import multivarNormalityTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomNormal import randomNormal
from pyimsl.stat.writeMatrix import writeMatrix

nobs = 150
ncol = 5
nvar = 5
izero = 0
randomSeedSet(123457)
x = randomNormal(nobs * nvar)
x = x.reshape(nobs, nvar)
ni = []
swt = []
r = []
nrmiss = []
xmean = []

stats = multivarNormalityTest(x, sumFreq=ni, sumWeights=swt,
                              nRowsMissing=nrmiss, r=r, means=xmean)

print("Sum of frequencies  = %d" % (ni[0]))
print("Sum of the weights  =%8.3f" % (swt[0]))
print("Number rows missing = %3i" % (nrmiss[0]))
writeMatrix("stat", stats, rowNumberZero=True, column=True)
writeMatrix("means", xmean, writeFormat="%9.6f")
writeMatrix("R", r, writeFormat="%8.3f")

Output

Sum of frequencies  = 150
Sum of the weights  = 150.000
Number rows missing =   0
 
     stat
 0         0.73
 1         1.36
 2        18.62
 3         0.99
 4        -2.37
 5        32.67
 6        34.54
 7         1.27
 8        -1.48
 9         0.14
10         1.62
11         8.24
12         0.02
 
                        means
        1          2          3          4          5
 0.026232   0.092377   0.065362   0.098194   0.056391
 
                         R
          1         2         3         4         5
1     1.033    -0.084    -0.065     0.108     0.067
2     0.000     1.049    -0.097    -0.042    -0.021
3     0.000     0.000     1.063     0.006    -0.145
4     0.000     0.000     0.000     0.942    -0.084
5     0.000     0.000     0.000     0.000     0.949