randomnessTest

Performs a test for randomness.

Synopsis

randomnessTest (x, nRun)

Required Arguments

float x[] (Input)
Array of size nObservations containing the data.
int nRun (Input)
Length of longest run for which tabulation is desired. For optional arguments pairs, dsquare, and dcube, nRun stands for the number of equiprobable cells into which the statistics are to be tabulated.

Return Value

The probability of a larger chi‑squared statistic for testing the null hypothesis of a uniform distribution.

Optional Arguments

ido, ido, intermediateResults (Input/Output)
Process data in blocks.
int ido (Input)

Processing option. The argument ido must be 1, 2, or 3. With this option, it is not a requirement that all observations be memory resident, thus enabling one to handle large data sets. Blocks of rows of the data can be processed sequentially in separate invocations of randomnessTest. Output argument values are returned only when ido = 3. (See Example 5.)

ido Action
1 This is the first invocation with this data; additional calls will be made. The first set of nObservations observations is input in x.
2 This is an intermediate invocation. The next set of nObservations observations is input in x.
3 This is the final invocation of this function. No further invocations of randomnessTest with ido greater than 1 should be made without first invoking randomnessTest with ido = 1. The last set of nObservations observations is input in x.

Default: ido is not used. All the data is input at once.

float intermediateResults[] (Input/Output)

User-supplied array containing results from invocations of the function. The length of intermediateResults is:

Test Length
Runs test (runs) nRun
Pairs test (pairs) nRun by nRun
\(d^2\) test (dsquare) nRun
triplets test (dcube) nRun by nRun by nRun

In processing blocks of data, x can have different number of observations, nObservations, in separate invocations.

runs, runsCount, covariances, (Output) Indicates the runs test is to be performed. Array of length nRun containing the counts of the number of runs up of each length is returned in runsCount. nRun by nRun matrix containing the variances and covariances of the counts is returned in covariances. runs is the default test, however, to return the counts and covariances the runs argument must be used.

or

pairs, int pairsLag (Input), float pairsCount, (Output)
Indicates the pairs test is to be performed. The lag to be used in computing the pairs statistic is stored in pairsLag. Pairs (x[i], x[i + pairsLag]) for i = 0,…, NpairsLag -1 are tabulated, where N is the total sample size. An nRun by nRun matrix containing the count of the number of pairs in each cell is returned in pairsCount.

or

dsquare, float dsquareCount, (Output)Indicates the \(d^2\) test is to be performed. dsquareCount is an an array of length nRun containing the tabulations for the \(d^2\) test.

or

dcube, float dcubeCount, (Output)Indicates the triplets test is to be performed. dcubeCount is an an array of length nRun by nRun by nRun containing the tabulations for the triplets test.

runsExpect (Output)
An array of length nRun containing the expected number of runs of each length. This option is valid only for the runs test.
expect (Output)
Expected number of counts for each cell. This argument is valid only if one of pairs, dsquare, or dcube is used. It is not valid for the runs test.
chiSquared (Output)
Chi‑squared statistic for testing the null hypothesis of a uniform distribution.
degreesOfFreedom (Output)
Degrees of freedom for chi‑squared.

Description

Runs Up Test

Function randomnessTest performs one of four different tests for randomness. Optional argument runs computes statistics for the runs up test. Runs tests are used to test for cyclical trend in sequences of random numbers. If the runs down test is desired, each observation should first be multiplied by -1 to change its sign, and runs called with the modified vector of observations.

runs first tallies the number of runs up (increasing sequences) of each desired length. For \(i=1,\ldots,r-1\), where r = nRun, runsCount[i] contains the number of runs of length i. runsCount[nRun] contains the number of runs of length nRun or greater. As an example of how runs are counted, the sequence (1, 2, 3, 1) contains 1 run up of length 3, and one run up of length 1.

After tallying the number of runs up of each length, runs computes the expected values and the covariances of the counts according to methods given by Knuth (1981, pages 65-67). Let R denote a vector of length nRun containing the number of runs of each length so that the i‑th element of R, \(r_i\), contains the count of the runs of length i. Let \(\Sigma_R\) denote the covariance matrix of R under the null hypothesis of randomness, and let \(\mu_R\) denote the vector of expected values for R under this null hypothesis, then an approximate chi-squared statistic with nRun degrees of freedom is given as

\[\chi^2 = \left(R - \mu_R\right)^T \sum\nolimits_{R}^{-1} \left(R - \mu_R\right)\]

In general, the larger the value of each element of \(\mu_R\), the better the chi-squared approximation.

Pairs Test

pairs computes the pairs test (or the Good’s serial test) on a hypothesized sequence of uniform (0,1) pseudo-random numbers. The test proceeds as follows. Subsequent pairs (x[i], x[i + pairsLag]) are tallied into a k × k matrix, where k = nRun. In this tally, element \((j,m)\) of the matrix is incremented, where

\[\begin{split}\begin{array}{l} j = \lfloor k\mathtt{x}[i-1]\rfloor + 1 \\ m = \lfloor k\mathtt{x}[i+l-1]\rfloor + 1 \\ \end{array}\end{split}\]

where l = pairsLag, and the notation ⌊ ⌋ represents the greatest integer function, \(\lfloor Y\rfloor\) is the greatest integer less than or equal to Y, where Y is a real number. If \(l=1\), then \(i=1,3,5, \ldots,n-1\). If \(l>1\), then \(i=1,2,3,\ldots,n-l\), where n is the total number of pseudo-random numbers input on the current invocation of pairs (i.e., n = nObservations).

Given the tally matrix in pairsCount, chi-squared is computed as

\[\chi^2 = \sum_{i,j=0}^{k-1} \frac{\left(o_{ij} - e\right)^2}{e}\]

where \(e=\Sigma o_{ij}/k^2\), and \(o_{ij}\) is the observed count in cell \((i,j)\) (\(o_{ij}\) = pairsCount[i][j]).

Because pair statistics for the trailing observations are not tallied on any call, the user should call pairs with nObservations as large as possible. For pairsLag < 20 and n_observations = 2000, little power is lost.

\(d ^2\) Test

dsquare computes the \(d^2\) test for succeeding quadruples of hypothesized pseudo-random uniform (0, 1) deviates. The \(d^2\) test is performed as follows. Let \(X_1\), \(X_2\), \(X_3\), and \(X_4\) denote four pseudo-random uniform deviates, and consider

\[D^2 = (X_3 -X_1)^2 + (X_4 - X_2)^2\]

The probability distribution of \(D^2\) is given as

\[\Pr\left(D^2 \leq d^2\right) = d^2 \pi - \frac{8d^3}{3} + \frac{d^4}{2}\]

when \(D^2\leq 1\), where π denotes the value of pi. If \(D^2>1\), this probability is given as

\[\begin{split}\begin{array}{l} \Pr\left(D^2 \leq d^2\right) = \tfrac{1}{3} + (\pi -2)d^2 + 4 \sqrt{d^2 - 1} \\ + 8 \frac{\left(d^2 - 1\right)^{\frac{3}{2}}}{3} - \frac{d^4}{2} - 4d^2 \arctan \left( \frac{\sqrt{1 - \frac{1}{d^2}}}{\frac{1}{d}} \right) \end{array}\end{split}\]

See Gruenberger and Mark (1951) for a derivation of this distribution.

For each succeeding set of 4 pseudo-random uniform numbers input in X, \(d^2\) and the cumulative probability of \(d^2\) (\(Pr(D^2\leq d^2)\)) are computed. The resulting probability is tallied into one of k = nRun equally spaced intervals.

Let n denote the number of sets of four random numbers input (n = the total number of observations/4). Then, under the null hypothesis that the numbers input are random uniform (0, 1) numbers, the expected value for each element in dsquareCount is \(e=n/k\). An approximate chi-squared statistic is computed as

\[\chi^2 = \sum_{i=0}^{k-1} \frac{\left(o_i - e\right)^2}{e}\]

where \(o_i\) = dsquareCount[i] is the observed count. Thus, \(X^2\) has \(k-1\) degrees of freedom, and the null hypothesis of pseudo-random uniform (0, 1) deviates is rejected if \(X^2\) is too large. As n increases, the chi-squared approximation becomes better. A useful generalization is that \(e>5\) yields a good chi-squared approximation.

Triplets Test

dcube computes the triplets test on a sequence of hypothesized pseudo-random uniform(0, 1) deviates. The triplets test is computed as follows:

Each set of three successive deviates, \(X_1\), \(X_2\), and \(X_3\), is tallied into one of \(m^3\) equal sized cubes, where m = nRun. Let \(i=\left[mX_1\right]+1\), \(j= \left[mX_2\right]+1\), and \(k=\left[mX_3\right]+1\). For the triplet (\(X_1\), \(X_2\), \(X_3\)), dcubeCount[i][j][k] is incremented.

Under the null hypothesis of pseudo-random uniform(0, 1) deviates, the \(m^3\) cells are equally probable and each has expected value \(e=n/m^3\), where n is the number of triplets tallied. An approximate chi-squared statistic is computed as

\[\chi^2 = \sum_{i,j,k=0}^{k-1} \frac{\left(o_{ijk} - e\right)^2}{e}\]

where \(o_{ijk}\) = dcubeCount[i][j][k].

The computed chi-squared has \(m^3-1\) degrees of freedom, and the null hypothesis of pseudo-random uniform (0, 1) deviates is rejected if \(X^2\) is too large.

Examples

Example 1

This example illustrates the use of the runs test on \(10^4\) pseudo-random uniform deviates. Since the probability of a larger chi-squared statistic is 0.1872, there is no strong evidence to support rejection of this null hypothesis of randomness.

from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix

nran = 10000
n_run = 6
randomSeedSet(123457)
fmt = "%8.1f"
chisq = []
df = []
runs_expect = []
runs = {}
x = randomUniform(nran)

pvalue = randomnessTest(x, n_run,
                        chiSquared=chisq, df=df,
                        runsExpect=runs_expect, runs=runs)

runs_counts = runs["runsCount"]
covariances = runs["covariances"]
writeMatrix("runs_counts", runs_counts, writeFormat=fmt)
writeMatrix("runs_expect", runs_expect, writeFormat=fmt)
writeMatrix("covariances", covariances, writeFormat=fmt)
print("chisq  = ", chisq[0])
print("df     = ", df[0])
print("pvalue = ", pvalue)

Output

chisq  =  8.765216121616097
df     =  6.0
pvalue =  0.1872190489024498
 
                        runs_counts
       1         2         3         4         5         6
  1709.0    2046.0     953.0     260.0      55.0       4.0
 
                        runs_expect
       1         2         3         4         5         6
  1667.3    2083.4     916.6     263.8      57.5      11.9
 
                         covariances
          1         2         3         4         5         6
1    1278.2    -194.6    -148.9     -71.6     -22.9      -6.7
2    -194.6    1410.1    -490.6    -197.2     -55.2     -14.4
3    -148.9    -490.6     601.4    -117.4     -31.2      -7.8
4     -71.6    -197.2    -117.4     222.1     -10.8      -2.6
5     -22.9     -55.2     -31.2     -10.8      54.8      -0.6
6      -6.7     -14.4      -7.8      -2.6      -0.6      11.7

Example 2

This example illustrates the calculations of the pairs statistics when a random sample of size \(10^4\) is used and the pairsLag is 1. The results are not significant. PyIMSL function randomUniform (Chapter 12, Random Number Generation) is used in obtaining the pseudo-random deviates.

from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix

nran = 10000
n_run = 10
randomSeedSet(123467)
pairs = {"pairsLag": 5}
expect = []
chisq = []
df = []
x = randomUniform(nran)

pvalue = randomnessTest(x, n_run,
                        chiSquared=chisq, df=df,
                        expect=expect, pairs=pairs)

writeMatrix("pairs_counts", pairs["pairsCount"], writeFormat="%4i")
print("expect = ", expect[0])
print("chisq  = ", chisq[0])
print("df     = ", df[0])
print("pvalue = ", pvalue)

Output

expect =  99.95
chisq  =  104.85992996498248
df     =  99.0
pvalue =  0.32431195635310744
 
                         pairs_counts
       1     2     3     4     5     6     7     8     9    10
 1   112    82    95   118   103   103   113    84    90    74
 2   104   106   109   108   101    98   102    92   109    88
 3    88   111    86   106   112    79   103   105   106   101
 4    91   110   108    92    88   108   113    93   105   114
 5   104   105   103   104   101    94    96    87    93   104
 6    98   104   103   104    79    89    92   104    92   100
 7   103    91    97   101   116    83   118   118   106    99
 8   105   105   111    91    93    82   100   104   110    89
 9    92   102    82   101    94   128   102   110   125    98
10    79    99   103    98   104   101    93    93    98   105

Example 3

In this example, 2000 observations generated via PyIMSL function randomUniform (Chapter 12, Random Number Generation) are input to dsquare in one call. In the example, the null hypothesis of a uniform distribution is not rejected.

from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix

nran = 2000
n_run = 6
randomSeedSet(123457)
chisq = []
df = []
expect = []
dsquare_counts = []
x = randomUniform(nran)

pvalue = randomnessTest(x, n_run,
                        chiSquared=chisq, df=df,
                        expect=expect, dsquare=dsquare_counts)

writeMatrix("dsquare_counts", dsquare_counts, writeFormat="%4i")
print("expect = ", expect[0])
print("chisq  = ", chisq[0])
print("df     = ", df[0])
print("pvalue = ", pvalue)

Output

expect =  83.33333333333333
chisq  =  2.056
df     =  5.0
pvalue =  0.8413433814411666
 
          dsquare_counts
   1     2     3     4     5     6
  87    84    78    76    92    83

Example 4

In this example, 2001 deviates generated by PyIMSL function randomUniform (Chapter 12, Random Number Generation) are input to dcube, and tabulated in 27 equally sized cubes. In the example, the null hypothesis is not rejected.

from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix

nran = 2001
n_run = 3
randomSeedSet(123457)
chisq = []
df = []
expect = []
dcube_counts = []
x = randomUniform(nran)

pvalue = randomnessTest(x, n_run,
                        chiSquared=chisq, df=df,
                        expect=expect, dcube=dcube_counts)

writeMatrix("dcube_counts", dcube_counts[0], writeFormat="%4i")
writeMatrix("dcube_counts", dcube_counts[1], writeFormat="%4i")
writeMatrix("dcube_counts", dcube_counts[2], writeFormat="%4i")
print("expect = ", expect[0])
print("chisq  = ", chisq[0])
print("df     = ", df[0])
print("pvalue = ", pvalue)

Output

expect =  24.703703703703702
chisq  =  21.763118440779607
df     =  26.0
pvalue =  0.7015850883536483
 
   dcube_counts
      1     2     3
1    26    27    24
2    20    17    32
3    30    18    21
 
   dcube_counts
      1     2     3
1    20    16    26
2    22    22    27
3    30    24    26
 
   dcube_counts
      1     2     3
1    28    30    22
2    23    24    22
3    33    30    27

Example 5

This example is based on Example 1 to illustrate the use of the ido optional argument. In this example, randomnessTest is called 10 times, with 1000 pseudo-random uniform deviates each time. Since the probability of a larger chi-squared statistic is 0.1872, there is no strong evidence to support rejection of this null hypothesis of randomness.

from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix

randomSeedSet(123457)
nRan = 1000
nRun = 6
chisq = []
df = []
runsExpect = []
runs = {}
intermResults = zeros(nRun, dtype=double)
ido = {"ido": 1, "intermediateResults": intermResults}

for i in range(10):
    x = randomUniform(nRan)
    if i == 9:
        ido["ido"] = 3
        pvalue = randomnessTest(x, nRun,
                                ido=ido,
                                chiSquared=chisq, df=df,
                                runsExpect=runsExpect, runs=runs)
    else:
        if i == 0:
            ido["ido"] = 1
        else:
            ido["ido"] = 2
        pvalue = randomnessTest(x, nRun, ido=ido)

fmt = "%8.1f"
runsCounts = runs["runsCount"]
covariances = runs["covariances"]
writeMatrix("runsCounts", runsCounts, writeFormat=fmt)
writeMatrix("runsExpect", runsExpect, writeFormat=fmt)
writeMatrix("covariances", covariances, writeFormat=fmt)
print("chisq  = ", chisq[0])
print("df     = ", df[0])
print("pvalue = ", pvalue)

Output

chisq  =  8.765216121616097
df     =  6.0
pvalue =  0.1872190489024498
 
                        runsCounts
       1         2         3         4         5         6
  1709.0    2046.0     953.0     260.0      55.0       4.0
 
                        runsExpect
       1         2         3         4         5         6
  1667.3    2083.4     916.6     263.8      57.5      11.9
 
                         covariances
          1         2         3         4         5         6
1    1278.2    -194.6    -148.9     -71.6     -22.9      -6.7
2    -194.6    1410.1    -490.6    -197.2     -55.2     -14.4
3    -148.9    -490.6     601.4    -117.4     -31.2      -7.8
4     -71.6    -197.2    -117.4     222.1     -10.8      -2.6
5     -22.9     -55.2     -31.2     -10.8      54.8      -0.6
6      -6.7     -14.4      -7.8      -2.6      -0.6      11.7