randomSample

Generates a simple pseudorandom sample from a finite population.

Synopsis

randomSample (population, nsamp)

Required Arguments

float population[[]] (Input)
nrow by nvar matrix containing the population to be sampled. If either of the optional arguments firstCall or additionalCall are specified, then population contains a different part of the population on each invocation, otherwise population contains the entire population.
int nsamp (Input)
The sample size desired.

Return Value

nsamp by nvar matrix containing the sample.

Optional Arguments

firstCall, index, npop (Output)
This is the first invocation with this data; additional calls to randomSample may be made to add to the population. Additional calls should be made using the optional argument additionalCall. Argument index is an array of length nsamp containing the indices of the sample in the population. Argument npop returns the number of items in the population. If the population is input a few items at a time, the first call to randomSample should use firstCall, and subsequent calls should use additionalCall. See example 2.
additionalCall, int index, int npop, float samp (Input/Output)
This is an additional invocation of randomSample, and updating for the subpopulation in population is performed. Argument index is an array of length nsamp containing the indices of the sample in the population, as returned using optional argument firstCall. Argument npop, also obtained using optional argument firstCall, returns the number of items in the population. It is not necessary to know the number of items in the population in advance. npop is used to cumulate the population size and should not be changed between calls to randomSample. Argument samp is the array of size nsamp by nvar containing the sample. samp is the result of calling randomSample with optional argument firstCall. See Example 2.

Description

Function randomSample generates a pseudorandom sample from a given population, without replacement, using an algorithm due to McLeod and Bellhouse (1983).

The first nsamp items in the population are included in the sample. Then, for each successive item from the population, a random item in the sample is replaced by that item from the population with probability equal to the sample size divided by the number of population items that have been encountered at that time.

Examples

Example 1

In this example, randomSample is used to generate a sample of size 5 from a population stored in the matrix population.

from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.randomSample import randomSample
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.writeMatrix import writeMatrix

nrow = 176
nvar = 2
nsamp = 5
population = dataSets(2)
randomSeedSet(123457)
ir = randomSample(population, nsamp)
writeMatrix("The Sample", ir,
            noRowLabels=True, noColLabels=True, writeFormat="%5i")

Output

 
 The Sample
 1764     36
 1828     62
 1923      5
 1773     34
 1769    106

Example 2

Function randomSample is now used to generate a sample of size 5 from the same population as in the example above except the data are input to randomSample one observation at a time. This is the way randomSample may be used to sample from a large data file. Notice that the number of records need not be known in advance.

from __future__ import print_function
from numpy import *
from pyimsl.stat.dataSets import dataSets
from pyimsl.stat.randomSample import randomSample
from pyimsl.stat.randomSeedSet import randomSeedSet
from pyimsl.stat.writeMatrix import writeMatrix

nrow = 176
nvar = 2
nsamp = 5

population = dataSets(2)

randomSeedSet(123457)

firstCall = {}
sample = randomSample([population[0]], nsamp, firstCall=firstCall)
index = firstCall['index']
npop = firstCall['npop']
ac = {'index': index, 'npop': npop, 'samp': sample}

for i in range(1, 176):
    randomSample([population[i]], nsamp, additionalCall=ac)

print("The population size is ", ac['npop'].value)
writeMatrix("Indices of random sample", ac['index'], column=True)
writeMatrix("The sample", ac['samp'],
            noRowLabels=True, noColLabels=True)

Output

The population size is  176
 
Indices of random sample
     1           16
     2           80
     3          175
     4           25
     5           21
 
       The sample
       1764           36
       1828           62
       1923            6
       1773           35
       1769          106