ranks¶
Computes the ranks, normal scores, or exponential scores for a vector of observations.
Synopsis¶
ranks (x)
Required Arguments¶
- float
x[]
(Input) - Array of length
nObservations
containing the observations to be ranked.
Return Value¶
A vector of length nObservations
containing the rank (or optionally, a
transformation of the rank) of each observation.
Optional Arguments¶
averageTie
, or
highest
, or
lowest
, or
randomSplit
- Exactly one of these optional arguments may be used to change the method used to assign a score to tied observations.
Keyword | Result |
---|---|
averageTie |
average of the scores of the tied observations (default) |
highest |
highest score in the group of ties |
lowest |
lowest score in the group of ties |
randomSplit |
tied observations are randomly split using a random number generator. |
fuzz
, float (Input)- Value used to determine when two items are tied. If
abs
(x
[i
]-x
[j
]) is less than or equal tofuzz
, thenx
[i
] andx
[j
] are said to be tied. The default value forfuzz
is 0.0.
ranks
, or
blomScores
, or
tukeyScores
, or
vanDerWaerdenScores
, or
expectedNormalScores
, or
savageScores
- Exactly one of these optional arguments may be used to specify the type of values returned.
Keyword | Result |
---|---|
ranks |
ranks (default) |
blomScores |
Blom version of normal scores |
tukeyScores |
Tukey version of normal scores |
vanDerWaerdenScores |
Van der Waerden version of normal scores |
expectedNormalScores |
expected value of normal order statistics (For tied observations, the average of the expected normal scores.) |
savageScores |
Savage scores (the expected value of exponential order statistics) |
Description¶
Ties¶
In data without ties, the output values are the ordinary ranks (or a
transformation of the ranks) of the data in x
. If x
[i
] has the
smallest value among the values in x
and there is no other element in
x
with this value, then ranks
[i
] = 1. If both x
[i
]
and x
[j
] have the same smallest value, then the output value
depends upon the option used to break ties.
Keyword | Result |
---|---|
averageTie |
ranks[i] =ranks[j] =1.5 |
highest |
ranks[i] =ranks[j] =2.0 |
lowest |
ranks[i] =ranks [j] =1.0 |
randomSplit |
or, randomly,
|
When the ties are resolved randomly, the function randomUniform
is used
to generate random numbers. Different results may occur from different
executions of the program unless the “seed” of the random number generator
is set explicitly by use of the function
randomSeedSet.
The Scores¶
Normal and other functions of the ranks can optionally be returned. Normal
scores can be defined as the expected values, or approximations to the
expected values, of order statistics from a normal distribution. The simplest
approximations are obtained by evaluating the inverse cumulative normal
distribution function, normalInverseCdf
, at the ranks scaled into the
open interval (0,1). In the Blom version (see Blom 1958), the scaling
transformation for the rank \(r_i\) (\(1\leq r_i\leq n\) where n is
the sample size, nObservations
) is \((r_i-3/8)/(n+1/4)\). The Blom
normal score corresponding to the observation with rank \(r_i\) is
where Φ(⋅) is the normal cumulative distribution function.
Adjustments for ties are made after the normal score transformation; that
is, if x
[i
] equals x
[j
] (within fuzz
) and their
value is the k‑th smallest in the data set, the Blom normal scores are
determined for ranks of k and k + 1. Then, these normal scores are
averaged or selected in the manner specified. (Whether the transformations
are made first or ties are resolved first makes no difference except when
averageTie
is specified.)
In the Tukey version (see Tukey 1962), the scaling transformation for the rank \(r_i\)is \((r_i-1/3)/(n+1/3)\). The Tukey normal score corresponding to the observation with rank \(r_i\) is
Ties are handled in the same way as for the Blom normal scores.
In the Van der Waerden version (see Lehmann 1975, p. 97), the scaling transformation for the rank \(r_i\) is \(r_i/(n+1)\). The Van der Waerden normal score corresponding to the observation with rank \(r_i\) is
Ties are handled in the same way as for the Blom normal scores.
When option expectedNormalScores
is used, the output values are the
expected values of the normal order statistics from a sample of size
nObservations
. If the value in x
[i
] is the k-th smallest,
then the value output in ranks
[i
] is \(E(z_k)\) where
\(E(\cdot)\) is the expectation operator, and \(z_k\) is the k-th
order statistic in a sample of size nObservations
from a standard normal
distribution. Ties are handled in the same way as for the Blom normal scores.
Savage scores are the expected values of the exponential order statistics
from a sample of size nObservations
. These values are called Savage
scores because of their use in a test discussed by Savage (1956) (see Lehmann
1975). If the value in x
[i
] is the k-th smallest, then the value
output in ranks
[i
] is \(E(y_k)\) where \(y_k\) is the
k-th order statistic in a sample of size nObservations
from a standard
exponential distribution. The expected value of the k-th order statistic
from an exponential sample of size n (nObservations
) is
Ties are handled in the same way as for the Blom normal scores.
Examples¶
Example 1¶
The data for this example, from Hinkley (1977), contains 30 observations. Note that the fourth and sixth observations are tied, and that the third and twentieth observations are tied.
from numpy import *
from pyimsl.math.ranks import ranks
from pyimsl.math.writeMatrix import writeMatrix
x = array([0.77, 1.74, 0.81, 1.20, 1.95, 1.20, 0.47, 1.43,
3.37, 2.20, 3.00, 3.09, 1.51, 2.10, 0.52, 1.62,
1.31, 0.32, 0.59, 0.81, 2.81, 1.87, 1.18, 1.35,
4.75, 2.48, 0.96, 1.89, 0.90, 2.05])
rankres = ranks(x)
writeMatrix('Ranks', rankres)
Output¶
Ranks
1 2 3 4 5 6
5.0 18.0 6.5 11.5 21.0 11.5
7 8 9 10 11 12
2.0 15.0 29.0 24.0 27.0 28.0
13 14 15 16 17 18
16.0 23.0 3.0 17.0 13.0 1.0
19 20 21 22 23 24
4.0 6.5 26.0 19.0 10.0 14.0
25 26 27 28 29 30
30.0 25.0 9.0 20.0 8.0 22.0
Example 2¶
This example uses all of the score options with the same data set, which contains some ties. Ties are handled in several different ways in this example.
from numpy import *
from pyimsl.math.randomSeedSet import randomSeedSet
from pyimsl.math.ranks import ranks
from pyimsl.math.writeMatrix import writeMatrix
x = array([0.77, 1.74, 0.81, 1.20, 1.95, 1.20, 0.47, 1.43,
3.37, 2.20, 3.00, 3.09, 1.51, 2.10, 0.52, 1.62,
1.31, 0.32, 0.59, 0.81, 2.81, 1.87, 1.18, 1.35,
4.75, 2.48, 0.96, 1.89, 0.90, 2.05])
row_labels = ["Blom", "Tukey", "Van der Waerden", "Expected Value"]
# Blom scores using largest ranks for ties
score = []
r = ranks(x, highest=True, blomScores=True)
score.append(r)
# Tukey normal scores using smallest ranks for ties
r = ranks(x, lowest=True, tukeyScores=True)
score.append(r)
# Van der Waerden scores using randomly resolved ties
randomSeedSet(123457)
r = ranks(x, randomSplit=True, vanDerWaerdenScores=True)
score.append(r)
# Expected value of normal order statistics using averaging to
# break ties
r = ranks(x, expectedNormalScores=True)
score.append(r)
writeMatrix("Normal Order Statistics", score, rowLabels=row_labels)
# Savage scores using averaging to break ties
r = ranks(x, savageScores=True)
writeMatrix("Expected values of exponential order "
"statistics", r)
Output¶
Normal Order Statistics
1 2 3 4
Blom -1.024 0.209 -0.776 -0.294
Tukey -1.020 0.208 -0.890 -0.381
Van der Waerden -0.989 0.204 -0.753 -0.287
Expected Value -1.026 0.209 -0.836 -0.338
5 6 7 8
Blom 0.473 -0.294 -1.610 -0.041
Tukey 0.471 -0.381 -1.599 -0.041
Van der Waerden 0.460 -0.372 -1.518 -0.040
Expected Value 0.473 -0.338 -1.616 -0.041
9 10 11 12
Blom 1.610 0.776 1.176 1.361
Tukey 1.599 0.773 1.171 1.354
Van der Waerden 1.518 0.753 1.131 1.300
Expected Value 1.616 0.777 1.179 1.365
13 14 15 16
Blom 0.041 0.668 -1.361 0.125
Tukey 0.041 0.666 -1.354 0.124
Van der Waerden 0.040 0.649 -1.300 0.122
Expected Value 0.041 0.669 -1.365 0.125
17 18 19 20
Blom -0.209 -2.040 -1.176 -0.776
Tukey -0.208 -2.015 -1.171 -0.890
Van der Waerden -0.204 -1.849 -1.131 -0.865
Expected Value -0.209 -2.043 -1.179 -0.836
21 22 23 24
Blom 1.024 0.294 -0.473 -0.125
Tukey 1.020 0.293 -0.471 -0.124
Van der Waerden 0.989 0.287 -0.460 -0.122
Expected Value 1.026 0.294 -0.473 -0.125
25 26 27 28
Blom 2.040 0.893 -0.568 0.382
Tukey 2.015 0.890 -0.566 0.381
Van der Waerden 1.849 0.865 -0.552 0.372
Expected Value 2.043 0.894 -0.568 0.382
29 30
Blom -0.668 0.568
Tukey -0.666 0.566
Van der Waerden -0.649 0.552
Expected Value -0.669 0.568
Expected values of exponential order statistics
1 2 3 4 5 6
0.179 0.892 0.240 0.474 1.166 0.474
7 8 9 10 11 12
0.068 0.677 2.995 1.545 2.162 2.495
13 14 15 16 17 18
0.743 1.402 0.104 0.815 0.555 0.033
19 20 21 22 23 24
0.141 0.240 1.912 0.975 0.397 0.614
25 26 27 28 29 30
3.995 1.712 0.350 1.066 0.304 1.277