UVSTA
Computes basic univariate statistics.
Required Arguments
X — ∣NROW∣ by NVAR + m matrix containing the data, where m is 0, 1, or 2 depending on whether any column(s) of X correspond to weights and/or frequencies. (Input)
STAT — 15 by NVAR matrix containing in each row statistics on all of the variables. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3.)
The columns of STAT correspond to the columns of X, except for the columns of X containing weights or frequencies. (The columns beyond the weights or frequencies column are shifted to the left.)
I |
STAT(I, *) |
1 |
contains means. |
2 |
contains variances. |
3 |
contains standard deviations. |
4 |
contains coefficients of skewness. |
5 |
contains coefficients of excess (kurtosis). |
6 |
contains minima. |
7 |
contains maxima. |
8 |
contains ranges. |
9 |
contains coefficients of variation, when they are defined. If the coefficient of variation is not defined for a given variable, STAT(9, *) contains a zero in the corresponding position. |
10 |
contains numbers (counts) of nonmissing observations. |
11 |
is used only when CONPRM is positive, and, in this case, contains the lower confidence limit for the mean (assuming normality). |
12 |
is used only when CONPRM is positive, and, in this case, contains the upper confidence limit for the mean (assuming normality). |
13 |
is used only when CONPRV is positive, and, in this case, contains the lower confidence limit for the variance (assuming normality). |
14 |
is used only when CONPRV is positive, and, in this case, contains the upper confidence limit for the variance (assuming normality). |
15 |
is used only when weighting is used (IWT is nonnegative), and, in this case, contains the sums of the weights. |
Optional Arguments
IDO — Processing option. (Input)
Default: IDO = 0.
IDO |
Action |
0 |
This is the only invocation of UVSTA for this data set, and all the data are input at once. |
1 |
This is the first invocation, and additional calls to UVSTA will be made. Initialization and updating for the data in X are performed. The means are output correctly, but the other quantities output in STAT are intermediate quantities. |
2 |
This is an intermediate invocation of UVSTA, and updating for the data in X is performed. |
3 |
This is the final invocation of this routine. If NROW is not zero, updating is performed. The wrap-up computations for STAT are performed. |
NROW — The absolute value of NROW is the number of rows of data currently input in X. (Input)
Default: NROW = size (X,1).
NROW may be positive, zero, or negative. Negative NROW means that the ‑NROW rows of data are to be deleted from some aspects of the analysis, and this should be done only if IDO is 2 or 3 and the wrap-up computations for STAT have not been performed. When a negative value is input for NROW, it is assumed that each of the ‑NROW rows of X has been input (with positive NROW) in a previous invocation of UVSTA. Use of negative values of NROW should be made with care and with the understanding that some quantities in STAT cannot be updated properly in this case. In particular, the minima, maxima, and ranges are not updated because of deletion. It is also possible that a constant variable in the remaining data will not be recognized as such.
NVAR — Number of variables (not including the weight or frequency variable, if used). (Input)
Default: NVAR = size (X,2).
LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program. (Input)
Default: LDX = size (X,1).
IFRQ — Frequency option. (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies.
Default: IFRQ = 0.
IWT — Weighting option. (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column IWT of X contains the weights.
Default: IWT = 0.
MOPT — Missing value option. (Input)
NaN (not a number from routine AMACH(6)) is interpreted as the missing value code and any value in X equal to NaN is excluded from the computations.
Default: MOPT = 0.
MOPT |
Action |
0 |
The exclusion is listwise. (The entire row of X is excluded if any of the values of the row is equal to the missing value code.) |
1 |
The exclusion is elementwise. (Statistics for variables with nonmissing values are updated.) |
CONPRM — Confidence level for two-sided interval estimate of the means (assuming normality), in percent. (Input)
If CONPRM ≤ 0, no confidence interval for the mean is computed; otherwise, a CONPRM percent confidence interval is computed, in which case CONPRM must be between 0.0 and 100.0. CONPRM is often 90.0, 95.0, or 99.0. For a one-sided confidence interval with confidence level ONECL, set CONPRM = 100.0 ‑ 2.0 * (100.0 ‑ ONECL).
Default: CONPRM = .95.0.
CONPRV — Confidence level for two-sided interval estimate of the variances (assuming normality), in percent. (Input)
The confidence intervals are symmetric in probability (rather than in length). See also the description of CONPRM.
Default: CONPRV = .95.0.
IPRINT — Printing option. (Input)
Default: IPRINT = 0.
IPRINT |
Action |
1 |
No printing is performed. |
2 |
Statistics in STAT are printed if IDO = 0 or 3. |
3 |
Intermediate means, sums of squares about the mean, minima, maxima, and counts are printed when IDO = 1 or 2, and all statistics in STAT are printed when IDO = 0 or 3. |
LDSTAT — Leading dimension of STAT exactly as specified in the dimension statement in the calling program. (Input)
Default: LDSTAT = size (STAT,1).
NRMISS — Number of rows of data encountered in calls to UVSTA that contain any missing values. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3.)
Rows with a frequency of zero are not counted.
FORTRAN 90 Interface
Generic: CALL UVSTA (X, STAT [, …])
Specific: The specific interface names are S_UVSTA and D_UVSTA.
FORTRAN 77 Interface
Single: CALL UVSTA (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, CONPRM, CONPRV, IPRINT, STAT, LDSTAT, NRMISS)
Double: The double precision name is DUVSTA.
Description
For the data in each column of X, except the columns containing frequencies or weights, UVSTA computes the sample mean, variance, minimum, maximum, and other basic statistics. It also computes confidence intervals for the mean and variance if the sample is assumed to be from a normal population.
Missing values, that is, values equal to NaN (not a number, the value returned by routine AMACH(6)), are excluded from the computations. If MOPT is positive, the exclusion is listwise; that is, the entire observation is excluded and no computations are performed even for the variables with valid values. If frequencies or weights are specified, any observation whose frequency or weight is missing is excluded from the computations.
Frequencies are interpreted as multiple occurrences of the other values in the observations. That is, a row of X with a frequency variable having a value of 2 has the same effect as two rows with frequencies of 1. The total of the frequencies is used in computing all of the statistics based on moments (mean, variance, skewness, and kurtosis). Weights are not viewed as replication factors. The sum of the weights is used only in computing the mean (of course, then the weighted mean is used in computing the central moments). Both weights and frequencies can be zero, but neither can be negative. In general, a zero frequency means that the row is to be eliminated from the analysis; no further processing, counting of missing values, or error checking is done on the row. Although it is not required that frequencies be integers, the logic of their treatment implicitly assumes that they are. Weights, on the other hand, are allowed to be continuous. A weight of zero results in the row being counted, and updates are made of statistics and of the number of missing values. A missing value for the frequency or a missing value for the weight when the frequency is nonzero results in the row being deleted from the analysis; but even in that case, if one is nonmissing, it is an error for that nonmissing weight or frequency to be negative.
The definitions of some of the statistics are given below in terms of a single variable x. The i-th datum is xi, with corresponding frequency fi and weight wi. If either frequencies or weights are not specified, fi and/or wi are identically one. The summation in each case is over the set of valid observations, based on the setting of MOPT and the presence of missing values in the data.
Number of nonmissing observations, STAT(10, ∗)
Mean, STAT(1, ∗)
Variance, STAT(2, ∗)
Skewness, STAT(4, ∗)
Excess or Kurtosis, STAT(5, ∗)
Minimum, STAT(6, ∗)
Maximum, STAT(7, ∗)
Range, STAT(8, ∗)
Coefficient of Variation, STAT(9, ∗)
The arguments IDO and NROW allow data to be input a few at a time and even to be deleted after having been included in the analysis. The minima, maxima, and ranges are not updated when observations are deleted.
Comments
Workspace may be explicitly provided, if desired, by use of U2STA/DU2STA. The reference is
CALL U2STA (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, CONPRM, CONPRV, IPRINT, STAT, LDSTAT, NRMISS, WK)
The additional argument is
WK — Real work vector of length specified above. WK should not be changed between calls to U2STA.
Examples
Example 1
This example uses data from Draper and Smith (1981). There are 5 variables and 13 observations.
USE UVSTA_INT
USE GDATA_INT
IMPLICIT NONE
INTEGER LDSTAT, LDX, NVAR
PARAMETER (LDSTAT=15, LDX=13, NVAR=5)
!
INTEGER IPRINT, NR, NROW, NV
REAL CONPRM, CONPRV, STAT(LDSTAT,NVAR), X(LDX,NVAR)
! Get data for example.
CALL GDATA (5, X, NR, NV)
! All data are input at once.
NROW = NR
! No unequal frequencies or weights
! are used.
! Get 95% confidence limits.
! Delete any row containing a missing
! value.
! Print results.
IPRINT = 1
CALL UVSTA (X, STAT, NROW=NROW, IPRINT=IPRINT)
END
Output
Univariate Statistics from UVSTA
Variable Mean Variance Std. Dev. Skewness Kurtosis
1 7.4615 34.6026 5.8824 0.68768 0.07472
2 48.1538 242.1410 15.5609 -0.04726 -1.32257
3 11.7692 41.0256 6.4051 0.61064 -1.07916
4 30.0000 280.1667 16.7382 0.32960 -1.01406
5 95.4231 226.3136 15.0437 -0.19486 -1.34244
Variable Minimum Maximum Range Coef. Var. Count
1 1.0000 21.0000 20.0000 0.7884 13.0000
2 26.0000 71.0000 45.0000 0.3231 13.0000
3 4.0000 23.0000 19.0000 0.5442 13.0000
4 6.0000 60.0000 54.0000 0.5579 13.0000
5 72.5000 115.9000 43.4000 0.1577 13.0000
Variable Lower CLM Upper CLM Lower CLV Upper CLV
1 3.9068 11.0162 17.7930 94.2894
2 38.7505 57.5572 124.5113 659.8163
3 7.8987 15.6398 21.0958 111.7918
4 19.8852 40.1148 144.0645 763.4335
5 86.3322 104.5139 116.3726 616.6877
Example 2
In this example, we use some simple data to illustrate the use of frequencies, missing values, and the parameters IDO and NROW. In the data below, “NaN” represents a missing value.
f |
x |
y |
2 |
3.0 |
5.0 |
1 |
9.0 |
2.0 |
3 |
1.0 |
NaN |
We bring in the data one observation at a time in this example. Also, we bring in one false datum and then delete it on a subsequent call to UVSTA.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDSTAT, NVAR
PARAMETER (LDSTAT=15, NVAR=2)
!
INTEGER IDO, IFRQ, IPRINT, MOPT, NRMISS, NROW
REAL STAT(LDSTAT,NVAR), X1(1,NVAR+1)
! All data are input one observation
! at a time in the vector X1.
NROW = 1
! Frequencies are in the first
! position. No weights are used.
IFRQ = 1
! Get 95% confidence limits.
! Elementwise deletion of missing
! values.
MOPT = 1
! Print results, intermediate as well.
IPRINT = 2
! Bring in the first observation.
IDO = 1
X1(1,1) = 2.0
X1(1,2) = 3.0
X1(1,3) = 5.0
CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &
IPRINT=IPRINT, NRMISS=NRMISS)
! Bring in the second observation.
IDO = 2
X1(1,1) = 1.0
X1(1,2) = 9.0
X1(1,3) = 2.0
CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &
IPRINT=IPRINT, NRMISS=NRMISS)
! Bring in a false observation.
X1(1,1) = 3.0
X1(1,2) = 6.0
X1(1,3) = 3.0
CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &
IPRINT=IPRINT, NRMISS=NRMISS)
! Delete the false observation.
! This may make the mimina, maxima,
! and range incorrect.
NROW = -1
X1(1,1) = 3.0
X1(1,2) = 6.0
X1(1,3) = 3.0
CALL UVSTA (X1, STAT, IDO=IDO, NROW=NROW, NVAR=NVAR, IFRQ=IFRQ, &
MOPT=MOPT, IPRINT=IPRINT, NRMISS=NRMISS)
NROW = 1
! Bring in the final observation.
IDO = 3
X1(1,1) = 3.0
X1(1,2) = 1.0
X1(1,3) = AMACH(6)
CALL UVSTA (X1, STAT, IDO=IDO, NROW=NROW, NVAR=NVAR, IFRQ=IFRQ, &
MOPT=MOPT, IPRINT=IPRINT, NRMISS=NRMISS)
END
Output
Intermediate Statistics from UVSTA
Variable Mean Sum Sqs. Minimum Maximum Count
1 3.0000 0.0000 3.0000 3.0000 2.0000
2 5.0000 0.0000 5.0000 5.0000 2.0000
Intermediate Statistics from UVSTA
Variable Mean Sum Sqs. Minimum Maximum Count
1 5.0000 24.0000 3.0000 9.0000 3.0000
2 4.0000 6.0000 2.0000 5.0000 3.0000
Intermediate Statistics from UVSTA
Variable Mean Sum Sqs. Minimum Maximum Count
1 5.5000 25.5000 3.0000 9.0000 6.0000
2 3.5000 7.5000 2.0000 5.0000 6.0000
Intermediate Statistics from UVSTA
Variable Mean Sum Sqs. Minimum Maximum Count
1 5.0000 24.0000 3.0000 9.0000 3.0000
2 4.0000 6.0000 2.0000 5.0000 3.0000
Univariate Statistics from UVSTA
Variable Mean Variance Std. Dev. Skewness Kurtosis
1 3.0000 9.6000 3.0984 1.4142 0.5000
2 4.0000 3.0000 1.7321 -0.7071 -1.5000
Variable Minimum Maximum Range Coef. Var. Count
1 1.0000 9.0000 8.0000 1.0328 6.0000
2 2.0000 5.0000 3.0000 0.4330 3.0000
Variable Lower CLM Upper CLM Lower CLV Upper CLV
1 -0.2516 6.2516 3.7405 57.7470
2 -0.3027 8.3027 0.8133 118.4935