randomMvarFromData

Generates pseudorandom numbers from a multivariate distribution determined from a given sample.

Synopsis

randomMvarFromData (nRandom, x, nn)

Required Arguments

int nRandom (Input)
Number of random multivariate vectors to generate.
float x[[]] (Input)
Array of size nsamp × ndim matrix containing the given sample.
int nn (Input)
Number of nearest neighbors of the randomly selected point in x that are used to form the output point in the result.

Return Value

nRandom × ndim matrix containing the random multivariate vectors in its rows.

Description

Given a sample of size n (= nsamp) of observations of a k-variate random variable, randomMvarFromData generates a pseudorandom sample with approximately the same moments as the given sample. The sample obtained is essentially the same as if sampling from a Gaussian kernel estimate of the sample density. (See Thompson 1989.) Function randomMvarFromData uses methods described by Taylor and Thompson (1986).

Assume that the (vector-valued) observations \(x_i\) are in the rows of x. An observation, \(x_j\), is chosen randomly; its nearest m (= nn) neighbors,

\[x_{j_1},x_{j_2}, \ldots x_{j_m}\]

are determined; and the mean

\[\overline{x}_j\]

of those nearest neighbors is calculated. Next, a random sample \(u_1,u_2,\ldots,u_m\) is generated from a uniform distribution with lower bound

\[\frac{1}{m} - \sqrt{\frac{3(m-1)}{m^2}}\]

and upper bound

\[\frac{1}{m} + \sqrt{\frac{3(m-1)}{m^2}}\]

The random variate delivered is

\[\sum_{l=1}^{m} u_l \left(x_{jl} - \overline{x}_j\right) + \overline{x}_j\]

The process is then repeated until nRandom such simulated variates are generated and stored in the rows of the result.

Example

In this example, randomMvarFromData is used to generate 5 pseudorandom vectors of length 4 using the initial and final systolic pressure and the initial and final diastolic pressure from Data Set A in Afifi and Azen (1979) as the fixed sample from the population to be modeled. (Values of these four variables are in the seventh, tenth, twenty-first, and twenty-fourth columns of data set number nine in function dataSets, Chapter 15, Utilities.)

from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.randomMvarFromData import randomMvarFromData
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.writeMatrix import writeMatrix

n_random = 5
k = 4
nsamp = 113
nn = 5
x = empty((113, 4), dtype='double')
rdata = empty((113, 34), dtype='double')
nrrow = []
nrcol = []

randomSeedSet(123457)

rdata = dataSets(9, nObservations=nrrow, nVariables=nrcol)
for i in range(0, nrrow[0]):
    x[i, 0] = rdata[i, 6]
for i in range(0, nrrow[0]):
    x[i, 1] = rdata[i, 9]
for i in range(0, nrrow[0]):
    x[i, 2] = rdata[i, 20]
for i in range(0, nrrow[0]):
    x[i, 3] = rdata[i, 23]

r = randomMvarFromData(n_random, x, nn)

writeMatrix("Random variates", r)

Output

 
                   Random variates
             1            2            3            4
1        162.8         90.5        153.7        104.9
2        153.4         78.3        176.7         85.2
3         93.7         48.2        153.5         71.4
4        101.8         54.2        113.1         56.3
5         91.7         58.8         48.4         28.1