FNLStat : Basic Statistics : Usage Notes
Usage Notes
Frequency Tabulations
The routines for frequency tabulations accept raw data in the form of vectors or matrices and produce counts. Two of these routines assume generally that the data are continuous and tally the observations into groups based on grouping information that the user supplies. Another routine for frequency tabulations assumes basically that the data are discrete and counts the number of observations with each value. Other analyses of discrete data or count data can be performed using IMSL routines in Chapter 5, “Categorical and Discrete Data Analysis.”
Univariate Summary Statistics
The routine UVSTA computes the sample mean, variance, minimum, maximum, and other basic statistics for each variable in a data set. It also computes confidence intervals for the mean and variance if the sample is assumed to be from a normal distribution.
Ranks and Order Statistics
The routines for ranks and order statistics accept data from a single sample stored in a vector. Ranks, order statistics, and sample quantiles form the basis for many nonparametric and robust statistical techniques (see Conover 1980 and Hoaglin et al. 1983). Letter values, computed by the routine LETTR, are a special set of order statistics particularly useful in exploratory data analysis (see Hoaglin 1983).
Parametric Estimates and Tests
The routines described in this section compute statistics for simple inferences about the parameters in normal, binomial, and Poisson distributions. General discussions of estimation techniques for these distributions can be found in Johnson and Kotz (1969, 1970a, 1970b). The routine UVSTA, for univariate summary statistics, also computes statistics for simple inferences about the parameters in a single normal distribution.
Grouped Data
The routine GRPES computes several basic statistics, such as arithmetic means, geometric means, harmonic means, and moments about the arithmetic mean for grouped data. The second, third, and fourth moments are computed both with and without Sheppard’s corrections.
Continuous Data in a Table
The routine CSTAT accepts data sets with both classification variables and response variables. The classification variables define cells in a table. Within each cell, means and sums of squares are computed for the response variables. Further analysis of the response variables, in particular, assessment of the effects of the classification variables, may be performed using the routines described in Chapter 4 on analysis of variance. An alternative for two-way tables is median polish, which is more resistant to outliers, but which is more exploratory. That is, no test is performed to confirm statistically that row and/or column effects are present. The routine MEDPL in this section performs median polish. (See Tukey, 1977; Velleman and Hoaglin, 1981; and Emerson and Hoaglin, 1983.) For count data (frequencies), the routines described in Chapter 5: Categorical and Discrete Data Analysis,” are appropriate for determining the amount of association among the rows and columns.