randomnessTest¶
Performs a test for randomness.
Synopsis¶
randomnessTest (x, nRun)
Required Arguments¶
- float
x[]
(Input) - Array of size
nObservations
containing the data. - int
nRun
(Input) - Length of longest run for which tabulation is desired. For optional
arguments
pairs
,dsquare
, anddcube
,nRun
stands for the number of equiprobable cells into which the statistics are to be tabulated.
Return Value¶
The probability of a larger chi‑squared statistic for testing the null hypothesis of a uniform distribution.
Optional Arguments¶
ido
,ido
,intermediateResults
(Input/Output)- Process data in blocks.
- int
ido
(Input) Processing option. The argument
ido
must be 1, 2, or 3. With this option, it is not a requirement that all observations be memory resident, thus enabling one to handle large data sets. Blocks of rows of the data can be processed sequentially in separate invocations ofrandomnessTest
. Output argument values are returned only whenido
= 3. (See Example 5.)ido
Action 1 This is the first invocation with this data; additional calls will be made. The first set of nObservations
observations is input inx
.2 This is an intermediate invocation. The next set of nObservations
observations is input inx
.3 This is the final invocation of this function. No further invocations of randomnessTest
withido
greater than 1 should be made without first invokingrandomnessTest
withido
= 1. The last set ofnObservations
observations is input inx
.Default:
ido
is not used. All the data is input at once.- float
intermediateResults[]
(Input/Output) User-supplied array containing results from invocations of the function. The length of
intermediateResults
is:Test Length Runs test ( runs
)nRun
Pairs test ( pairs
)nRun
bynRun
\(d^2\) test ( dsquare
)nRun
triplets test ( dcube
)nRun
bynRun
bynRun
In processing blocks of data,
x
can have different number of observations,nObservations
, in separate invocations.
runs
, runsCount
, covariances
, (Output) Indicates the runs test is to be performed. Array of length nRun
containing the counts of the number of runs up of each length is returned in runsCount
. nRun
by nRun
matrix containing the variances and covariances of the counts is returned in covariances
. runs
is the default test, however, to return the counts and covariances the runs
argument must be used.
or
pairs
, intpairsLag
(Input), floatpairsCount
, (Output)- Indicates the pairs test is to be performed. The lag to be used in
computing the pairs statistic is stored in
pairsLag
. Pairs (x
[i
],x
[i
+pairsLag
]) fori
= 0,…,N
‑pairsLag
-1 are tabulated, whereN
is the total sample size. AnnRun
bynRun
matrix containing the count of the number of pairs in each cell is returned inpairsCount
.
or
dsquare
, float dsquareCount
, (Output)Indicates the \(d^2\) test is to be performed. dsquareCount
is an an array of length nRun
containing the tabulations for the \(d^2\) test.
or
dcube
, float dcubeCount
, (Output)Indicates the triplets test is to be performed. dcubeCount
is an an array of length nRun
by nRun
by nRun
containing the tabulations for the triplets test.
runsExpect
(Output)- An array of length
nRun
containing the expected number of runs of each length. This option is valid only for the runs test. expect
(Output)- Expected number of counts for each cell. This argument is valid only if
one of
pairs
,dsquare
, ordcube
is used. It is not valid for the runs test. chiSquared
(Output)- Chi‑squared statistic for testing the null hypothesis of a uniform distribution.
degreesOfFreedom
(Output)- Degrees of freedom for chi‑squared.
Description¶
Runs Up Test¶
Function randomnessTest
performs one of four different tests for
randomness. Optional argument runs
computes statistics for the runs up
test. Runs tests are used to test for cyclical trend in sequences of random
numbers. If the runs down test is desired, each observation should first be
multiplied by -1 to change its sign, and runs
called with the modified
vector of observations.
runs
first tallies the number of runs up (increasing sequences) of each
desired length. For \(i=1,\ldots,r-1\), where r = nRun
,
runsCount
[i] contains the number of runs of length i.
runsCount
[nRun
] contains the number of runs of length nRun
or
greater. As an example of how runs are counted, the sequence (1, 2, 3, 1)
contains 1 run up of length 3, and one run up of length 1.
After tallying the number of runs up of each length, runs
computes the
expected values and the covariances of the counts according to methods given
by Knuth (1981, pages 65-67). Let R denote a vector of length nRun
containing the number of runs of each length so that the i‑th element of
R, \(r_i\), contains the count of the runs of length i. Let
\(\Sigma_R\) denote the covariance matrix of R under the null
hypothesis of randomness, and let \(\mu_R\) denote the vector of expected
values for R under this null hypothesis, then an approximate chi-squared
statistic with nRun
degrees of freedom is given as
In general, the larger the value of each element of \(\mu_R\), the better the chi-squared approximation.
Pairs Test¶
pairs
computes the pairs test (or the Good’s serial test) on a
hypothesized sequence of uniform (0,1) pseudo-random numbers. The test
proceeds as follows. Subsequent pairs (x
[i], x
[i +
pairsLag
]) are tallied into a k × k matrix, where k = nRun
. In
this tally, element \((j,m)\) of the matrix is incremented, where
where l = pairsLag
, and the notation ⌊ ⌋ represents the greatest
integer function, \(\lfloor Y\rfloor\) is the greatest integer less than
or equal to Y, where Y is a real number. If \(l=1\), then
\(i=1,3,5, \ldots,n-1\). If \(l>1\), then \(i=1,2,3,\ldots,n-l\),
where n is the total number of pseudo-random numbers input on the current
invocation of pairs
(i.e., n = nObservations
).
Given the tally matrix in pairsCount
, chi-squared is computed as
where \(e=\Sigma o_{ij}/k^2\), and \(o_{ij}\) is the observed count
in cell \((i,j)\) (\(o_{ij}\) = pairsCount[i][j]
).
Because pair statistics for the trailing observations are not tallied on any
call, the user should call pairs
with nObservations
as large as
possible. For pairsLag
< 20 and n_observations
= 2000, little
power is lost.
\(d ^2\) Test¶
dsquare
computes the \(d^2\) test for succeeding quadruples of
hypothesized pseudo-random uniform (0, 1) deviates. The \(d^2\) test is
performed as follows. Let \(X_1\), \(X_2\), \(X_3\), and
\(X_4\) denote four pseudo-random uniform deviates, and consider
The probability distribution of \(D^2\) is given as
when \(D^2\leq 1\), where π denotes the value of pi. If \(D^2>1\), this probability is given as
See Gruenberger and Mark (1951) for a derivation of this distribution.
For each succeeding set of 4 pseudo-random uniform numbers input in X
,
\(d^2\) and the cumulative probability of \(d^2\) (\(Pr(D^2\leq
d^2)\)) are computed. The resulting probability is tallied into one of k =
nRun
equally spaced intervals.
Let n denote the number of sets of four random numbers input (n = the
total number of observations/4). Then, under the null hypothesis that the
numbers input are random uniform (0, 1) numbers, the expected value for each
element in dsquareCount
is \(e=n/k\). An approximate chi-squared
statistic is computed as
where \(o_i\) = dsquareCount[i]
is the observed count. Thus,
\(X^2\) has \(k-1\) degrees of freedom, and the null hypothesis of
pseudo-random uniform (0, 1) deviates is rejected if \(X^2\) is too
large. As n increases, the chi-squared approximation becomes better. A
useful generalization is that \(e>5\) yields a good chi-squared
approximation.
Triplets Test¶
dcube
computes the triplets test on a sequence of hypothesized
pseudo-random uniform(0, 1) deviates. The triplets test is computed as
follows:
Each set of three successive deviates, \(X_1\), \(X_2\), and
\(X_3\), is tallied into one of \(m^3\) equal sized cubes, where m
= nRun
. Let \(i=\left[mX_1\right]+1\), \(j=
\left[mX_2\right]+1\), and \(k=\left[mX_3\right]+1\). For the triplet
(\(X_1\), \(X_2\), \(X_3\)), dcubeCount[i][j][k]
is
incremented.
Under the null hypothesis of pseudo-random uniform(0, 1) deviates, the \(m^3\) cells are equally probable and each has expected value \(e=n/m^3\), where n is the number of triplets tallied. An approximate chi-squared statistic is computed as
where \(o_{ijk}\) = dcubeCount[i][j][k]
.
The computed chi-squared has \(m^3-1\) degrees of freedom, and the null hypothesis of pseudo-random uniform (0, 1) deviates is rejected if \(X^2\) is too large.
Examples¶
Example 1¶
This example illustrates the use of the runs test on \(10^4\) pseudo-random uniform deviates. Since the probability of a larger chi-squared statistic is 0.1872, there is no strong evidence to support rejection of this null hypothesis of randomness.
from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix
nran = 10000
n_run = 6
randomSeedSet(123457)
fmt = "%8.1f"
chisq = []
df = []
runs_expect = []
runs = {}
x = randomUniform(nran)
pvalue = randomnessTest(x, n_run,
chiSquared=chisq, df=df,
runsExpect=runs_expect, runs=runs)
runs_counts = runs["runsCount"]
covariances = runs["covariances"]
writeMatrix("runs_counts", runs_counts, writeFormat=fmt)
writeMatrix("runs_expect", runs_expect, writeFormat=fmt)
writeMatrix("covariances", covariances, writeFormat=fmt)
print("chisq = ", chisq[0])
print("df = ", df[0])
print("pvalue = ", pvalue)
Output¶
chisq = 8.765216121616097
df = 6.0
pvalue = 0.1872190489024498
runs_counts
1 2 3 4 5 6
1709.0 2046.0 953.0 260.0 55.0 4.0
runs_expect
1 2 3 4 5 6
1667.3 2083.4 916.6 263.8 57.5 11.9
covariances
1 2 3 4 5 6
1 1278.2 -194.6 -148.9 -71.6 -22.9 -6.7
2 -194.6 1410.1 -490.6 -197.2 -55.2 -14.4
3 -148.9 -490.6 601.4 -117.4 -31.2 -7.8
4 -71.6 -197.2 -117.4 222.1 -10.8 -2.6
5 -22.9 -55.2 -31.2 -10.8 54.8 -0.6
6 -6.7 -14.4 -7.8 -2.6 -0.6 11.7
Example 2¶
This example illustrates the calculations of the pairs
statistics when a
random sample of size \(10^4\) is used and the pairsLag
is 1. The
results are not significant. PyIMSL function
randomUniform (Chapter 12, Random Number Generation) is
used in obtaining the pseudo-random deviates.
from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix
nran = 10000
n_run = 10
randomSeedSet(123467)
pairs = {"pairsLag": 5}
expect = []
chisq = []
df = []
x = randomUniform(nran)
pvalue = randomnessTest(x, n_run,
chiSquared=chisq, df=df,
expect=expect, pairs=pairs)
writeMatrix("pairs_counts", pairs["pairsCount"], writeFormat="%4i")
print("expect = ", expect[0])
print("chisq = ", chisq[0])
print("df = ", df[0])
print("pvalue = ", pvalue)
Output¶
expect = 99.95
chisq = 104.85992996498248
df = 99.0
pvalue = 0.32431195635310744
pairs_counts
1 2 3 4 5 6 7 8 9 10
1 112 82 95 118 103 103 113 84 90 74
2 104 106 109 108 101 98 102 92 109 88
3 88 111 86 106 112 79 103 105 106 101
4 91 110 108 92 88 108 113 93 105 114
5 104 105 103 104 101 94 96 87 93 104
6 98 104 103 104 79 89 92 104 92 100
7 103 91 97 101 116 83 118 118 106 99
8 105 105 111 91 93 82 100 104 110 89
9 92 102 82 101 94 128 102 110 125 98
10 79 99 103 98 104 101 93 93 98 105
Example 3¶
In this example, 2000 observations generated via PyIMSL function
randomUniform (Chapter 12, Random Number Generation) are
input to dsquare
in one call. In the example, the null hypothesis of a
uniform distribution is not rejected.
from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix
nran = 2000
n_run = 6
randomSeedSet(123457)
chisq = []
df = []
expect = []
dsquare_counts = []
x = randomUniform(nran)
pvalue = randomnessTest(x, n_run,
chiSquared=chisq, df=df,
expect=expect, dsquare=dsquare_counts)
writeMatrix("dsquare_counts", dsquare_counts, writeFormat="%4i")
print("expect = ", expect[0])
print("chisq = ", chisq[0])
print("df = ", df[0])
print("pvalue = ", pvalue)
Output¶
expect = 83.33333333333333
chisq = 2.056
df = 5.0
pvalue = 0.8413433814411666
dsquare_counts
1 2 3 4 5 6
87 84 78 76 92 83
Example 4¶
In this example, 2001 deviates generated by PyIMSL function
randomUniform (Chapter 12, Random Number Generation) are
input to dcube
, and tabulated in 27 equally sized cubes. In the example,
the null hypothesis is not rejected.
from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix
nran = 2001
n_run = 3
randomSeedSet(123457)
chisq = []
df = []
expect = []
dcube_counts = []
x = randomUniform(nran)
pvalue = randomnessTest(x, n_run,
chiSquared=chisq, df=df,
expect=expect, dcube=dcube_counts)
writeMatrix("dcube_counts", dcube_counts[0], writeFormat="%4i")
writeMatrix("dcube_counts", dcube_counts[1], writeFormat="%4i")
writeMatrix("dcube_counts", dcube_counts[2], writeFormat="%4i")
print("expect = ", expect[0])
print("chisq = ", chisq[0])
print("df = ", df[0])
print("pvalue = ", pvalue)
Output¶
expect = 24.703703703703702
chisq = 21.763118440779607
df = 26.0
pvalue = 0.7015850883536483
dcube_counts
1 2 3
1 26 27 24
2 20 17 32
3 30 18 21
dcube_counts
1 2 3
1 20 16 26
2 22 22 27
3 30 24 26
dcube_counts
1 2 3
1 28 30 22
2 23 24 22
3 33 30 27
Example 5¶
This example is based on Example 1 to illustrate the use of the ido
optional argument. In this example, randomnessTest
is called 10 times,
with 1000 pseudo-random uniform deviates each time. Since the probability of
a larger chi-squared statistic is 0.1872, there is no strong evidence to
support rejection of this null hypothesis of randomness.
from __future__ import print_function
from numpy import *
from pyimsl.stat.randomnessTest import randomnessTest
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.randomUniform import randomUniform
from pyimsl.stat.writeMatrix import writeMatrix
randomSeedSet(123457)
nRan = 1000
nRun = 6
chisq = []
df = []
runsExpect = []
runs = {}
intermResults = zeros(nRun, dtype=double)
ido = {"ido": 1, "intermediateResults": intermResults}
for i in range(10):
x = randomUniform(nRan)
if i == 9:
ido["ido"] = 3
pvalue = randomnessTest(x, nRun,
ido=ido,
chiSquared=chisq, df=df,
runsExpect=runsExpect, runs=runs)
else:
if i == 0:
ido["ido"] = 1
else:
ido["ido"] = 2
pvalue = randomnessTest(x, nRun, ido=ido)
fmt = "%8.1f"
runsCounts = runs["runsCount"]
covariances = runs["covariances"]
writeMatrix("runsCounts", runsCounts, writeFormat=fmt)
writeMatrix("runsExpect", runsExpect, writeFormat=fmt)
writeMatrix("covariances", covariances, writeFormat=fmt)
print("chisq = ", chisq[0])
print("df = ", df[0])
print("pvalue = ", pvalue)
Output¶
chisq = 8.765216121616097
df = 6.0
pvalue = 0.1872190489024498
runsCounts
1 2 3 4 5 6
1709.0 2046.0 953.0 260.0 55.0 4.0
runsExpect
1 2 3 4 5 6
1667.3 2083.4 916.6 263.8 57.5 11.9
covariances
1 2 3 4 5 6
1 1278.2 -194.6 -148.9 -71.6 -22.9 -6.7
2 -194.6 1410.1 -490.6 -197.2 -55.2 -14.4
3 -148.9 -490.6 601.4 -117.4 -31.2 -7.8
4 -71.6 -197.2 -117.4 222.1 -10.8 -2.6
5 -22.9 -55.2 -31.2 -10.8 54.8 -0.6
6 -6.7 -14.4 -7.8 -2.6 -0.6 11.7