multivarNormalityTest¶

Computes Mardia’s multivariate measures of skewness and kurtosis and tests for multivariate normality.

Synopsis¶

multivarNormalityTest (x)

Required Arguments¶

float x[[]] (Input): Array of size nObservations by nVariables containing the data.

Return Value¶

An array of dimension 13 containing output statistics

`i`	`stat[i]`
0	Estimated skewness.
1	Expected skewness assuming a multivariate normal distribution.
2	Asymptotic chi-squared statistic assuming a multivariate normal distribution.
3	Probability of a greater chi-squared.
4	Mardia and Foster’s standard normal score for skewness.
5	Estimated kurtosis.
6	Expected kurtosis assuming a multivariate normal distribution.
7	Asymptotic standard error of the estimated kurtosis.
8	Standard normal score obtained from `stat[5]` through `stat[7]`.
9	p-value corresponding to `stat[8]`.
10	Mardia and Foster’s standard normal score for kurtosis.
11	Mardia’s $S_W$ statistic based upon `stat[4]` and `stat[10]`.
12	p-value for `stat[11]`.

Optional Arguments¶

frequencies, float[] (Input): Array of size nObservations containing the frequencies. Frequencies must be integer valued. Default assumes all frequencies equal one.
weights, float[] (Input): Array of size nObservations containing the weights. Weights must be greater than non-negative. Default assumes all weights equal one.
sumFreq (Output): The sum of the frequencies of all observations used in the computations.
sumWeights (Output): The sum of the weights times the frequencies for all observations used in the computations.
nRowsMissing (Output): Number of rows of data in x[] containing any missing values (NaN).
means (Output): An array of length nVariables containing the sample means.
r (Output): An nVariables by nVariables upper triangular matrix containing the Cholesky $R^TR$ factorization of the covariance matrix.

Description¶

Function multivarNormalityTest computes Mardia’s (1970) measures $b_{1,p}$ and $b_{2,p}$ of multivariate skewness and kurtosis, respectfully, for p = nVariables. These measures are then used in computing tests for multivariate normality. Three test statistics, one based upon $b_{1,p}$ alone, one based upon $b_{2,p}$ alone, and an omnibus test statistic formed by combining normal scores obtained from $b_{1,p}$ and $b_{2,p}$ are computed. On the order of $np^3$ , operations are required in computing $b_{1,p}$ when the method of Isogai (1983) is used, where n = nObservations. On the order of $np^2$ , operations are required in computing $b_{2,p}$ .

Let

$d_{ij} = \sqrt{w_i w_j} \left(x_i - \overline{x}\right)^T S^{-1} \left(x_j - \overline{x}\right)$

where

$\begin{split}\begin{array}{l} S = \frac {\displaystyle\sum_{i=1}^{n} w_i f_i \left(x_i - \overline{x}\right) \left(x_i - \overline{x}\right)^T} {\displaystyle\sum_{i=1}^{n} f_i} \\ \overline{x} = \frac{1}{\displaystyle\sum_{i=1}^{n} w_i f_i} \displaystyle\sum_{i=1}^{n} w_i f_i x_i \end{array}\end{split}$

$f_i$ is the frequency of the i-th observation, and $w_i$ is the weight for this observation. (Weights $w_i$ are defined such that $x_i$ is distributed according to a multivariate normal, $N(\mu, \Sigma/w_i)$ distribution, where Σ is the covariance matrix.) Mardia’s multivariate skewness statistic is defined as:

$b_{1,p} = \frac{1}{n^2} \sum_{i=1}^{n} \sum_{j=1}^{n} f_i f_j d_{ij}^3$

while Mardia’s kurtosis is given as:

$b_{2,p} = \frac{1}{n} \sum_{i=1}^{n} f_i d_{ii}^2$

Both measures are invariant under the affine (matrix) transformation AX + D, and reduce to the univariate measures when p = nVariables = 1. Using formulas given in Mardia and Foster (1983), the approximate expected value, asymptotic standard error, and asymptotic p‑value for $b_{2,p}$ , and the approximate expected value, an asymptotic chi-squared statistic, and p‑value for the $b_{1,p}$ statistic are computed. These statistics are all computed under the null hypothesis of a multivariate normal distribution. In addition, standard normal scores $W_1(b_{1,p})$ and $W_2(b_{2,p})$ (different from but similar to the asymptotic normal and chi-squared statistics above) are computed. These scores are combined into an asymptotic chi-squared statistic with two degrees of freedom:

$S_W = W_1^2\left(b_{1,p}\right) + W_2^2 \left(b_{2,p}\right)$

This chi-squared statistic may be used to test for multivariate normality. A p‑value for the chi-squared statistic is also computed.

Example¶

In this example, 150 observations from a 5 dimensional standard normal distribution are generated via routine randomNormal (Chapter 12, Random Number Generation). The skewness and kurtosis statistics are then computed for these observations.

from __future__ import print_function
from numpy import *
from pyimsl.stat.multivarNormalityTest import multivarNormalityTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomNormal import randomNormal
from pyimsl.stat.writeMatrix import writeMatrix

nobs = 150
ncol = 5
nvar = 5
izero = 0
randomSeedSet(123457)
x = randomNormal(nobs * nvar)
x = x.reshape(nobs, nvar)
ni = []
swt = []
r = []
nrmiss = []
xmean = []

stats = multivarNormalityTest(x, sumFreq=ni, sumWeights=swt,
                              nRowsMissing=nrmiss, r=r, means=xmean)

print("Sum of frequencies  = %d" % (ni[0]))
print("Sum of the weights  =%8.3f" % (swt[0]))
print("Number rows missing = %3i" % (nrmiss[0]))
writeMatrix("stat", stats, rowNumberZero=True, column=True)
writeMatrix("means", xmean, writeFormat="%9.6f")
writeMatrix("R", r, writeFormat="%8.3f")

Output¶

Sum of frequencies  = 150
Sum of the weights  = 150.000
Number rows missing =   0
 
     stat
       0.73
       1.36
      18.62
       0.99
      -2.37
      32.67
      34.54
       1.27
      -1.48
       0.14
       1.62
       8.24
       0.02
 
                        means
        1          2          3          4          5
026232   0.092377   0.065362   0.098194   0.056391
 
                         R
          1         2         3         4         5
   1.033    -0.084    -0.065     0.108     0.067
   0.000     1.049    -0.097    -0.042    -0.021
   0.000     0.000     1.063     0.006    -0.145
   0.000     0.000     0.000     0.942    -0.084
   0.000     0.000     0.000     0.000     0.949