randomMvarFromData¶

Generates pseudorandom numbers from a multivariate distribution determined from a given sample.

Synopsis¶

randomMvarFromData (nRandom, x, nn)

Required Arguments¶

int nRandom (Input): Number of random multivariate vectors to generate.
float x[[]] (Input): Array of size nsamp × ndim matrix containing the given sample.
int nn (Input): Number of nearest neighbors of the randomly selected point in x that are used to form the output point in the result.

Return Value¶

nRandom × ndim matrix containing the random multivariate vectors in its rows.

Description¶

Given a sample of size n (= nsamp) of observations of a k-variate random variable, randomMvarFromData generates a pseudorandom sample with approximately the same moments as the given sample. The sample obtained is essentially the same as if sampling from a Gaussian kernel estimate of the sample density. (See Thompson 1989.) Function randomMvarFromData uses methods described by Taylor and Thompson (1986).

Assume that the (vector-valued) observations $x_i$ are in the rows of x. An observation, $x_j$ , is chosen randomly; its nearest m (= nn) neighbors,

$x_{j_1},x_{j_2}, \ldots x_{j_m}$

are determined; and the mean

$\overline{x}_j$

of those nearest neighbors is calculated. Next, a random sample $u_1,u_2,\ldots,u_m$ is generated from a uniform distribution with lower bound

$\frac{1}{m} - \sqrt{\frac{3(m-1)}{m^2}}$

and upper bound

$\frac{1}{m} + \sqrt{\frac{3(m-1)}{m^2}}$

The random variate delivered is

$\sum_{l=1}^{m} u_l \left(x_{jl} - \overline{x}_j\right) + \overline{x}_j$

The process is then repeated until nRandom such simulated variates are generated and stored in the rows of the result.

Example¶

In this example, randomMvarFromData is used to generate 5 pseudorandom vectors of length 4 using the initial and final systolic pressure and the initial and final diastolic pressure from Data Set A in Afifi and Azen (1979) as the fixed sample from the population to be modeled. (Values of these four variables are in the seventh, tenth, twenty-first, and twenty-fourth columns of data set number nine in function dataSets, Chapter 15, Utilities.)

from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.randomMvarFromData import randomMvarFromData
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.writeMatrix import writeMatrix

n_random = 5
k = 4
nsamp = 113
nn = 5
x = empty((113, 4), dtype='double')
rdata = empty((113, 34), dtype='double')
nrrow = []
nrcol = []

randomSeedSet(123457)

rdata = dataSets(9, nObservations=nrrow, nVariables=nrcol)
for i in range(0, nrrow[0]):
    x[i, 0] = rdata[i, 6]
for i in range(0, nrrow[0]):
    x[i, 1] = rdata[i, 9]
for i in range(0, nrrow[0]):
    x[i, 2] = rdata[i, 20]
for i in range(0, nrrow[0]):
    x[i, 3] = rdata[i, 23]

r = randomMvarFromData(n_random, x, nn)

writeMatrix("Random variates", r)

Output¶

 
                   Random variates
             1            2            3            4
      162.8         90.5        153.7        104.9
      153.4         78.3        176.7         85.2
       93.7         48.2        153.5         71.4
      101.8         54.2        113.1         56.3
       91.7         58.8         48.4         28.1