simpleStatistics

Computes basic univariate statistics.

Synopsis

simpleStatistics (x)

Required Arguments

float x[[]] (Input)
Array of size nObservations × nVariables containing the data matrix.

Return Value

A matrix containing some simple statistics for each of the columns in x. If median and medianAndScale are not used as optional arguments, the size of the matrix is 14 by nVariables. The columns of this matrix correspond to the columns of x and the rows contain the following statistics:

Row Statistic
0 the mean
1 the variance
2 the standard deviation
3 the coefficient of skewness
4 the coefficient of excess (kurtosis)
5 the minimum value
6 the maximum value
7 the range
8

the coefficient of variation (when defined)

If the coefficient of variation is not defined, zero is returned.

9 the number of observations (the counts)
10

a lower confidence limit for the mean (assuming normality)

The default is a 95 percent confidence interval.

11 an upper confidence limit for the mean (assuming normality)
12

a lower confidence limit for the variance (assuming normality)

The default is a 95 percent confidence interval.

13 an upper confidence limit for the variance (assuming normality)

Optional Arguments

confidenceMeans, float (Input)
The confidence level for a two-sided interval estimate of the means (assuming normality) in percent. Argument confidenceMeans must be between 0.0 and 100.0 and is often 90.0, 95.0, or 99.0. For a one-sided confidence interval with confidence level c, set confidenceMeans = 100.0 - 2(100 - c). If confidenceMeans is not specified, a 95 percent confidence interval is computed.
confidenceVariances, float (Input)
The confidence level for a two-sided interval estimate of the variances (assuming normality) in percent. The confidence intervals are symmetric in probability (rather than in length). For a one-sided confidence interval with confidence level c, set confidenceMeans = 100.0 − 2(100 − c). If confidenceVariances is not specified, a 95 percent confidence interval is computed.

median, or

medianAndScale
Exactly one of these optional arguments can be specified in order to indicate the additional simple robust statistics to be computed. If median is specified, the medians are computed and stored in one additional row (row number 14) in the returned matrix of simple statistics. If medianAndScale is specified, the medians, the medians of the absolute deviations from the medians, and a simple robust estimate of scale are computed, then stored in three additional rows (rows 14, 15, and 16) in the returned matrix of simple statistics.

Description

For the data in each column of x, simpleStatistics computes the sample mean, variance, minimum, maximum, and other basic statistics. It also computes confidence intervals for the mean and variance (under the hypothesis that the sample is from a normal population).

The definitions of some of the statistics are given below in terms of a single variable x of which the i-th datum is \(x_i\).

Mean

\[\overline{x} = \frac{\Sigma x_i}{n}\]

Variance

\[s^2 = \frac{\Sigma \left(x_i-\overline{x}\right)^2}{n-1}\]

Skewness

\[\frac {\Sigma\left(x_i-\overline{x}\right)^3 / n} {\left[\Sigma\left(x_i-\overline{x}\right)^2/n\right]^{3/2}}\]

Excess or Kurtosis

\[\frac {\Sigma\left(x_i-\overline{x}\right)^4 / n} {\left[\Sigma\left(x_i-\overline{x}\right)^2/n\right]^2} - 3\]

Minimum

\[x_{\min} = \min\left(x_i\right)\]

Maximum

\[x_{\max} = \max\left(x_i\right)\]

Range

\[x_{\max} - x_{\min}\]

Coefficient of Variation

\[s / \overline{x} \text{ for } \overline{x} \neq 0\]

Median

\[\begin{split}\text{median } \left\{x_i\right\} = \begin{cases} \text{middle } x_i \text{ after sorting if } n \text{ is odd} \\ \hline \text{average of middle two } x_i \text{'s if } n \text{ is even} \\ \end{cases}\end{split}\]

Median Absolute Deviation

\[\mathrm{MAD} = \mathrm{median} \left\{ |x_i - \mathrm{median} \left\{x_j\right\}| \right\}\]

Simple Robust Estimate of Scale

\[\mathrm{MAD} / \phi^{-1}(3/4)\]

where \(\Phi^{-1}(3/4)\approx 0.6745\) is the inverse of the standard normal distribution function evaluated at 3∕4. This standardizes MAD in order to make the scale estimate consistent at the normal distribution for estimating the standard deviation (Huber 1981, pp. 107-108).

Example

This example uses data from Draper and Smith (1981). There are five variables and 13 observations.

from numpy import *
from pyimsl.math.simpleStatistics import simpleStatistics
from pyimsl.math.writeMatrix import writeMatrix

x = array([[7., 26., 6., 60., 78.5],
           [1., 29., 15., 52., 74.3],
           [11., 56., 8., 20., 104.3],
           [11., 31., 8., 47., 87.6],
           [7., 52., 6., 33., 95.9],
           [11., 55., 9., 22., 109.2],
           [3., 71., 17., 6., 102.7],
           [1., 31., 22., 44., 72.5],
           [2., 54., 18., 22., 93.1],
           [21., 47., 4., 26., 115.9],
           [1., 40., 23., 34., 83.8],
           [11., 66., 9., 12., 113.3],
           [10., 68., 8., 12., 109.4]])
row_labels = ["means", "variances", "std. dev", "skewness", "kurtosis",
              "minima", "maxima", "ranges", "C.V.", "counts", "lower mean",
              "upper mean", "lower var", "upper var"]

simple_statistics = simpleStatistics(x)

writeMatrix("* * * Statistics * * *\n", simple_statistics,
            rowLabels=row_labels,
            writeFormat="%7.3f")

Output

 
                * * * Statistics * * *

                  1        2        3        4        5
means         7.462   48.154   11.769   30.000   95.423
variances    34.603  242.141   41.026  280.167  226.314
std. dev      5.882   15.561    6.405   16.738   15.044
skewness      0.688   -0.047    0.611    0.330   -0.195
kurtosis      0.075   -1.323   -1.079   -1.014   -1.342
minima        1.000   26.000    4.000    6.000   72.500
maxima       21.000   71.000   23.000   60.000  115.900
ranges       20.000   45.000   19.000   54.000   43.400
C.V.          0.788    0.323    0.544    0.558    0.158
counts       13.000   13.000   13.000   13.000   13.000
lower mean    3.907   38.750    7.899   19.885   86.332
upper mean   11.016   57.557   15.640   40.115  104.514
lower var    17.793  124.512   21.096  144.065  116.373
upper var    94.289  659.816  111.792  763.434  616.688