multivariateNormalCdf

../../_images/OpenMp_27.png

Evaluates the cumulative distribution function for the multivariate normal distribution.

Synopsis

multivariateNormalCdf (h, mean, sigma)

Required Arguments

float h[] (Input)
Array of length k containing the upper bounds for calculating the cumulative distribution function, \(F \left( X_1<h_1,X_2<h_2,\cdots,X_k<h_k \right)\).
float mean[] (Input)
Array of length k containing the mean of the multivariate normal distribution, i.e., \(E \left( x_1,x_2,\ldots x_k \right)=\mu^T=\left( \mu_1,\mu_2,\ldots\mu _k \right)\).
float sigma[[]] (Input)
Array of length k by k containing the positive definite symmetric matrix of correlations or of variances and covariances for the multivariate normal distribution, i.e., \(\mathit{Var} \left( x_1,x_2,\ldots x_k \right)=\Sigma\).

Return Value

The value of the cumulative distribution function for a multivariate normal random variable, \(F \left( X_1<h_1,X_2<h_2,\cdots,X_k<h_k \right)\).

Optional Arguments

t_print, (Input)

Print intermediate computations.

Default: No printing.

errAbs, float (Input)

The absolute accuracy requested for the calculated cumulative probability.

Default: errAbs = \(1.0e^{-3}\).

errRel, float (Input)

The relative accuracy desired.

Default: errRel = \(1.0e^{-5}\).

tolerance, float (Input)
The minimum value for the smallest eigenvalue of sigma. If the smallest eigenvalue is less than tolerance, then the terminal error IMSLS_SIGMA_SINGULAR is issued. Default: tolerance= ɛ, where ɛ is the machine precision.
maxEvals, int (Input)

The maximum number of function evaluations allowed. If this limit is exceeded, the IMSLS_MAX_EVALS_EXCEEDED warning message is issued and the optimization terminates.

Default: maxEvals = 1000×k.

randomSeed, int (Input)

The value of the random seed used for generating quadrature points. By using different seeds on different calls to this function, slightly different answers will be obtained in the digits beyond the estimated error. If randomSeed = 0, then the seed is set to the value of the system clock which will automatically generate slightly different answers for each call.

Default: randomSeed = 7919.

errEst (Output)
The estimated error.

Description

Function multivariateNormalCdf evaluates the cumulative distribution function F of a multivariate normal distribution with \(E(X_1,X_2,\cdots, X_k)=\mu\) and \(Var(X_1,X_2,\cdots,X_k)=\Sigma\). The input arrays mean and sigma are used as the values for μ and ∑, respectively. The formula for the CDF calculation is given by the multiple integral described in Johnson and Kotz (1972):

\[F\left(h_1,h_2, \ldots, h_k\right) = \int\limits_{-\infty}^{h_1} \int\limits_{-\infty}^{h_2} \cdots \int\limits_{-\infty}^{h_k} (2\pi)^{-k/2} |{\sum}|^{-1/2} \exp\left(-\frac{1}{2}(x - \mu)^T {\sum}^{-1}(x - \mu)\right) dx_1dx_2 \cdots dx_k\]

∑ must be positive definite, i.e. |∑| >0. If k = 2 or k > 2 the functions normalCdf and bivariateNormalCdf are used for this calculation.

In the special case of equal correlations, the above integral is transformed into a univariate integral using the transformation developed by Dunnett and Sobel(1955). This produces very accurate and fast calculations even for a large number of variates.

If k > 2 and the correlations are not equal, the Cholesky decomposition transformation described by Genz (1992) is used (with permission from the author). This transforms the problem into a definite integral involving k-1 variables which is solved numerically using randomized Korobov rules if k100, see Cranley and Patterson (1976) and Keast (1973); otherwise, the integral is solved using quasi-random Richtmeyer points described in Davis and Rabinowitz (1984).

Examples

Example 1

This example evaluates the cumulative distribution function for a trivariate normal random variable. There are three calculations. The first calculation is of \(F(1.96,1.96,1.96)\) for a trivariate normal variable with \(\mu=\{0,0,0\}\), and

\[\begin{split}\Sigma = \begin{bmatrix} 1 & 0.9 & 0.9 \\ 0.9 & 1 & 0.9 \\ 0.9 & 0.9 & 1 \\ \end{bmatrix}\end{split}\]

In this case, multivariateNormalCdf calculates \(F(1.96,1.96,1.96)=0.958179\).

The second calculation involves a trivariate variable with the same correlation matrix as the first calculation but with a mean of \(\mu=\{0,1,-1\}\). This is the same distribution as the first example shifted by the mean. The calculation of \(F(1.96,2.96,0.96)\) verifies that this probability is equal to the same value as reported for the first case.

The last calculation is the same calculation reported in Genz (1992) for a trivariate normal random variable with \(\mu=\{0,0,0\}\) and

\[\begin{split}\Sigma = \begin{bmatrix} 1 & 3/5 & 1/3 \\ 3/5 & 1 & 11/15 \\ 1/3 & 11/15 & 1 \\ \end{bmatrix}\end{split}\]

In this example the calculation of \(F(1,4,2)=0.827985\).

from __future__ import print_function
from numpy import *
from pyimsl.stat.multivariateNormalCdf import multivariateNormalCdf
from pyimsl.stat.writeMatrix import writeMatrix

bounds1 = [1.96, 1.96, 1.96]
bounds2 = [1.96, 2.96, 0.96]
bounds3 = [1.0, 4.0, 2.0]
mean1 = [0.0, 0.0, 0.0]
mean2 = [0.0, 1.0, -1.0]
stdev1 = [[1.0, 0.9, 0.9],
          [0.9, 1.0, 0.9],
          [0.9, 0.9, 1.0]]
stdev2 = [[1.0, 0.6, 1.0 / 3.0],
          [0.6, 1.0, 11.0 / 15.0],
          [1.0 / 3.0, 11.0 / 15.0, 1.0]]

fmt = "%5.3W"
writeMatrix("Mean Vector", mean1,
            noRowLabels=True, noColLabels=True, writeFormat=fmt)
writeMatrix("Correlation Matrix", stdev1,
            noRowLabels=True, noColLabels=True, writeFormat=fmt)
f = multivariateNormalCdf(bounds1, mean1, stdev1)
print("\nF(X1<%f, X2<%f, X3<%f) = %f\n" %
      (bounds1[0], bounds1[1], bounds1[2], f))

writeMatrix("Mean Vector\n", mean2,
            noRowLabels=True, noColLabels=True, writeFormat=fmt)
writeMatrix("Correlation Matrix", stdev1,
            noRowLabels=True, noColLabels=True, writeFormat=fmt)
f = multivariateNormalCdf(bounds2, mean2, stdev1)
print("\nF(X1<%f, X2<%f, X3<%f) = %f" %
      (bounds2[0], bounds2[1], bounds2[2], f))

writeMatrix("Mean Vector", mean1,
            noRowLabels=True, noColLabels=True, writeFormat=fmt)
writeMatrix("Correlation Matrix", stdev2,
            noRowLabels=True, noColLabels=True, writeFormat=fmt)
f = multivariateNormalCdf(bounds3, mean1, stdev2)
print("\nF(X1<%f, X2<%f, X3<%f) = %f" %
      (bounds3[0], bounds3[1], bounds3[2], f))

Output

F(X1<1.960000, X2<1.960000, X3<1.960000) = 0.958179


F(X1<1.960000, X2<2.960000, X3<0.960000) = 0.958179

F(X1<1.000000, X2<4.000000, X3<2.000000) = 0.827985
 
    Mean Vector
    0      0      0
 
Correlation Matrix
  1.0    0.9    0.9
  0.9    1.0    0.9
  0.9    0.9    1.0
 
   Mean Vector

    0      1     -1
 
Correlation Matrix
  1.0    0.9    0.9
  0.9    1.0    0.9
  0.9    0.9    1.0
 
    Mean Vector
    0      0      0
 
Correlation Matrix
 1.00   0.60   0.33
 0.60   1.00   0.73
 0.33   0.73   1.00

Example 2

This example illustrates the calculation of the cdf for a multivariate normal distribution with a mean of \(\mu=\{1,0,-1,0,1,-1\}\), and a correlation matrix of

\[\begin{split}\sum = \begin{bmatrix} 1 & 0.1 & 0.2 & 0.3 & 0.4 & 0 \\ 0.1 & 1 & 0.6 & 0.1 & 0.2 & 0.5 \\ 0.2 & 0.6 & 1 & 0 & 0.1 & 0.2 \\ 0.3 & 0.1 & 0 & 1 & 0 & 0.5 \\ 0.4 & 0.2 & 0.1 & 0 & 1 & 0.3 \\ 0 & 0.5 & 0.2 & 0.5 & 0.3 & 1 \\ \end{bmatrix}\end{split}\]

The optional argument t_print is used to illustrate the type of intermediate output available from this function. This function sorts the variables by the limits for the cdf calculation specified in x. This improves the accuracy of the calculations, see Genz (1992). In this case, \(F(X_1<1,X_2<2.5,X_3<2,X_4<0.5,X_5<0,X_6<0.8)=0.087237\) with an estimated error of 8.7e-05.

By increasing the correlation of \(X_2\) and \(X_3\) from 0.1 to 0.7, the correlation matrix becomes singular. This function checks for this condition and issues an error when sigma is not symmetric or positive definite.

from __future__ import print_function
from numpy import *
from pyimsl.stat.multivariateNormalCdf import multivariateNormalCdf
from pyimsl.stat.writeMatrix import writeMatrix

bounds = [1.0, 2.5, 2.0, 0.5, 0.0, 0.8]
mean = [1.0, 0.0, -1.0, 0.0, 1.0, -1.0]
s1 = [[1.0, 0.1, 0.2, 0.3, 0.4, 0.0],
      [0.1, 1.0, 0.6, 0.1, 0.2, 0.5],
      [0.2, 0.6, 1.0, 0.0, 0.1, 0.2],
      [0.3, 0.1, 0.0, 1.0, 0.0, 0.5],
      [0.4, 0.2, 0.1, 0.0, 1.0, 0.3],
      [0.0, 0.5, 0.2, 0.5, 0.3, 1.0]]
# The following matrix is not positive definite
s2 = [[1.0, 0.1, 0.2, 0.3, 0.4, 0.0],
      [0.1, 1.0, 0.6, 0.7, 0.2, 0.5],
      [0.2, 0.6, 1.0, 0.0, 0.1, 0.2],
      [0.3, 0.7, 0.0, 1.0, 0.0, 0.5],
      [0.4, 0.2, 0.1, 0.0, 1.0, 0.3],
      [0.0, 0.5, 0.2, 0.5, 0.3, 1.0]]
k = 6
err = []
f = multivariateNormalCdf(bounds, mean, s1, t_print=True, errEst=err)
print("F(X1<%2.1f, X2<%2.1f, X3<%2.1f, " %
      (bounds[0], bounds[1], bounds[2]), end=' ')
print("X4<%2.1f, X5<%2.1f, X6<%2.1f) = %f" %
      (bounds[3], bounds[4], bounds[5], f))
print("Estimated Error = ", err[0])
# example of error message when sigma is not positive definite
# f = multivariateNormalCdf (bounds, mean, s2, errEst=err)

Output

F(X1<1.0, X2<2.5, X3<2.0,  X4<0.5, X5<0.0, X6<0.8) = 0.087277
Estimated Error =  4.881451395023059e-06
 
               Original H Limits
   1.0     2.5     2.0     0.5     0.0     0.8
 
             Original Sigma Matrix
   1.0     0.1     0.2     0.3     0.4     0.0
   0.1     1.0     0.6     0.1     0.2     0.5
   0.2     0.6     1.0     0.0     0.1     0.2
   0.3     0.1     0.0     1.0     0.0     0.5
   0.4     0.2     0.1     0.0     1.0     0.3
   0.0     0.5     0.2     0.5     0.3     1.0
 
              Sorted Sigma Matrix
   1.0     0.4     0.0     0.3     0.2     0.1
   0.4     1.0     0.3     0.0     0.1     0.2
   0.0     0.3     1.0     0.5     0.1     0.0
   0.3     0.0     0.5     1.0     0.5     0.2
   0.2     0.1     0.1     0.5     1.0     0.6
   0.1     0.2     0.0     0.2     0.6     1.0

Eigenvalues of Sigma
eigenvalue[0] = 2.215651
eigenvalue[1] = 1.256233
eigenvalue[2] = 1.165661
eigenvalue[3] = 0.843083
eigenvalue[4] = 0.324266
eigenvalue[5] = 0.195106

Condition Number of Sigma = 9.105217
 
 Cholesky Decomposition of Sorted Sigma Matrix
 1.000   0.400   0.000   0.300   0.200   0.100
 0.400   0.917   0.327  -0.131   0.022   0.175
 0.000   0.327   0.945   0.575   0.098  -0.060
 0.300  -0.131   0.575   0.750   0.515   0.303
 0.200   0.022   0.098   0.515   0.827   0.515
 0.100   0.175  -0.060   0.303   0.515   0.774

Prob. = 0.0872772 Error = 4.88145e-06

Warning Errors

IMSLS_MAX_EVALS_EXCEEDED The maximum number of iterations for the CDF calculation has exceeded maxEvals. Required accuracy may not have been achieved.

Fatal Errors

IMSLS_SIGMA_SINGULAR sigma” is not positive definite. Its smallest eigenvalue is “e[#]=#, which is less than “tolerance” = #.