simpleStatistics¶
Computes basic univariate statistics.
Synopsis¶
simpleStatistics (x)
Required Arguments¶
- float
x[[]]
(Input) - Array of size
nObservations
×nVariables
containing the data matrix.
Return Value¶
A matrix containing some simple statistics for each of the columns in x
.
If median
and medianAndScale
are not used as optional arguments, the
size of the matrix is 14 by nVariables
. The columns of this matrix
correspond to the columns of x
and the rows contain the following
statistics:
Row |
Statistic |
---|---|
0 | the mean |
1 | the variance |
2 | the standard deviation |
3 | the coefficient of skewness |
4 | the coefficient of excess (kurtosis) |
5 | the minimum value |
6 | the maximum value |
7 | the range |
8 | the coefficient of variation (when defined) If the coefficient of variation is not defined, zero is returned. |
9 | the number of observations (the counts) |
10 | a lower confidence limit for the mean (assuming normality) The default is a 95 percent confidence interval. |
11 | an upper confidence limit for the mean (assuming normality) |
12 | a lower confidence limit for the variance (assuming normality) The default is a 95 percent confidence interval. |
13 | an upper confidence limit for the variance (assuming normality) |
Optional Arguments¶
confidenceMeans
, float (Input)- The confidence level for a two-sided interval estimate of the means
(assuming normality) in percent. Argument
confidenceMeans
must be between 0.0 and 100.0 and is often 90.0, 95.0, or 99.0. For a one-sided confidence interval with confidence level c, setconfidenceMeans
= 100.0 - 2(100 - c). IfconfidenceMeans
is not specified, a 95 percent confidence interval is computed. confidenceVariances
, float (Input)- The confidence level for a two-sided interval estimate of the variances
(assuming normality) in percent. The confidence intervals are symmetric
in probability (rather than in length). For a one-sided confidence
interval with confidence level c, set
confidenceMeans
= 100.0 − 2(100 − c). IfconfidenceVariances
is not specified, a 95 percent confidence interval is computed.
median
, or
medianAndScale
- Exactly one of these optional arguments can be specified in order to
indicate the additional simple robust statistics to be computed. If
median
is specified, the medians are computed and stored in one additional row (row number 14) in the returned matrix of simple statistics. IfmedianAndScale
is specified, the medians, the medians of the absolute deviations from the medians, and a simple robust estimate of scale are computed, then stored in three additional rows (rows 14, 15, and 16) in the returned matrix of simple statistics.
Description¶
For the data in each column of x, simpleStatistics
computes the sample
mean, variance, minimum, maximum, and other basic statistics. It also
computes confidence intervals for the mean and variance (under the
hypothesis that the sample is from a normal population).
The definitions of some of the statistics are given below in terms of a single variable x of which the i-th datum is \(x_i\).
Mean¶
Variance¶
Skewness¶
Excess or Kurtosis¶
Minimum¶
Maximum¶
Range¶
Coefficient of Variation¶
Median¶
Median Absolute Deviation¶
Simple Robust Estimate of Scale¶
where \(\Phi^{-1}(3/4)\approx 0.6745\) is the inverse of the standard normal distribution function evaluated at 3∕4. This standardizes MAD in order to make the scale estimate consistent at the normal distribution for estimating the standard deviation (Huber 1981, pp. 107-108).
Example¶
This example uses data from Draper and Smith (1981). There are five variables and 13 observations.
from numpy import *
from pyimsl.math.simpleStatistics import simpleStatistics
from pyimsl.math.writeMatrix import writeMatrix
x = array([[7., 26., 6., 60., 78.5],
[1., 29., 15., 52., 74.3],
[11., 56., 8., 20., 104.3],
[11., 31., 8., 47., 87.6],
[7., 52., 6., 33., 95.9],
[11., 55., 9., 22., 109.2],
[3., 71., 17., 6., 102.7],
[1., 31., 22., 44., 72.5],
[2., 54., 18., 22., 93.1],
[21., 47., 4., 26., 115.9],
[1., 40., 23., 34., 83.8],
[11., 66., 9., 12., 113.3],
[10., 68., 8., 12., 109.4]])
row_labels = ["means", "variances", "std. dev", "skewness", "kurtosis",
"minima", "maxima", "ranges", "C.V.", "counts", "lower mean",
"upper mean", "lower var", "upper var"]
simple_statistics = simpleStatistics(x)
writeMatrix("* * * Statistics * * *\n", simple_statistics,
rowLabels=row_labels,
writeFormat="%7.3f")
Output¶
* * * Statistics * * *
1 2 3 4 5
means 7.462 48.154 11.769 30.000 95.423
variances 34.603 242.141 41.026 280.167 226.314
std. dev 5.882 15.561 6.405 16.738 15.044
skewness 0.688 -0.047 0.611 0.330 -0.195
kurtosis 0.075 -1.323 -1.079 -1.014 -1.342
minima 1.000 26.000 4.000 6.000 72.500
maxima 21.000 71.000 23.000 60.000 115.900
ranges 20.000 45.000 19.000 54.000 43.400
C.V. 0.788 0.323 0.544 0.558 0.158
counts 13.000 13.000 13.000 13.000 13.000
lower mean 3.907 38.750 7.899 19.885 86.332
upper mean 11.016 57.557 15.640 40.115 104.514
lower var 17.793 124.512 21.096 144.065 116.373
upper var 94.289 659.816 111.792 763.434 616.688