randomSample¶
Generates a simple pseudorandom sample from a finite population.
Synopsis¶
randomSample (population, nsamp)
Required Arguments¶
- float
population[[]]
(Input) nrow
bynvar
matrix containing the population to be sampled. If either of the optional argumentsfirstCall
oradditionalCall
are specified, thenpopulation
contains a different part of the population on each invocation, otherwisepopulation
contains the entire population.- int
nsamp
(Input) - The sample size desired.
Return Value¶
nsamp
by nvar
matrix containing the sample.
Optional Arguments¶
firstCall
,index
,npop
(Output)- This is the first invocation with this data; additional calls to
randomSample
may be made to add to the population. Additional calls should be made using the optional argumentadditionalCall
. Argumentindex
is an array of lengthnsamp
containing the indices of the sample in the population. Argumentnpop
returns the number of items in the population. If the population is input a few items at a time, the first call torandomSample
should usefirstCall
, and subsequent calls should useadditionalCall
. See example 2. additionalCall
, intindex
, intnpop
, floatsamp
(Input/Output)- This is an additional invocation of
randomSample
, and updating for the subpopulation inpopulation
is performed. Argumentindex
is an array of lengthnsamp
containing the indices of the sample in the population, as returned using optional argumentfirstCall
. Argumentnpop
, also obtained using optional argumentfirstCall
, returns the number of items in the population. It is not necessary to know the number of items in the population in advance.npop
is used to cumulate the population size and should not be changed between calls torandomSample
. Argumentsamp
is the array of sizensamp
bynvar
containing the sample.samp
is the result of callingrandomSample
with optional argumentfirstCall
. See Example 2.
Description¶
Function randomSample
generates a pseudorandom sample from a given
population, without replacement, using an algorithm due to McLeod and
Bellhouse (1983).
The first nsamp
items in the population are included in the sample.
Then, for each successive item from the population, a random item in the
sample is replaced by that item from the population with probability equal
to the sample size divided by the number of population items that have been
encountered at that time.
Examples¶
Example 1¶
In this example, randomSample
is used to generate a sample of size 5
from a population stored in the matrix population
.
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.randomSample import randomSample
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.writeMatrix import writeMatrix
nrow = 176
nvar = 2
nsamp = 5
population = dataSets(2)
randomSeedSet(123457)
ir = randomSample(population, nsamp)
writeMatrix("The Sample", ir,
noRowLabels=True, noColLabels=True, writeFormat="%5i")
Output¶
The Sample
1764 36
1828 62
1923 5
1773 34
1769 106
Example 2¶
Function randomSample
is now used to generate a sample of size 5 from
the same population as in the example above except the data are input to
randomSample
one observation at a time. This is the way randomSample
may be used to sample from a large data file. Notice that the number of
records need not be known in advance.
from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.randomSample import randomSample
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.writeMatrix import writeMatrix
nrow = 176
nvar = 2
nsamp = 5
population = dataSets(2)
randomSeedSet(123457)
firstCall = {}
sample = randomSample([population[0]], nsamp, firstCall=firstCall)
index = firstCall['index']
npop = firstCall['npop']
ac = {'index': index, 'npop': npop, 'samp': sample}
for i in range(1, 176):
randomSample([population[i]], nsamp, additionalCall=ac)
print("The population size is ", ac['npop'].value)
writeMatrix("Indices of random sample", ac['index'], column=True)
writeMatrix("The sample", ac['samp'],
noRowLabels=True, noColLabels=True)
Output¶
The population size is 176
Indices of random sample
1 16
2 80
3 175
4 25
5 21
The sample
1764 36
1828 62
1923 6
1773 35
1769 106