CHIGF
Performs a chi‑squared goodness‑of‑fit test.
Required Arguments
CDF — User‑supplied FUNCTION to compute the cumulative distribution function (CDF) at a given value. The form is CDF(Y), where
Y – Value at which the CDF is to be evaluated. (Input)
CDF – Value of the CDF at Y. (Output)
CDF must be declared EXTERNAL in the calling program.
NELM — The absolute value of NELM is the number of data elements currently input in X. (Input)
NELM may be positive, zero, or negative. Negative NELM means delete the ‑NELM data elements from the analysis.
X — Vector of length ∣NELM∣ containing the data elements for this call. (Input)
If the data element is missing (NaN, not a number), then the observation is ignored.
NCAT — The absolute value of NCAT is the number of cells into which the observations are to be tallied. (Input)
If NCAT is negative, then CHIGF chooses the cutpoints in CUTP so that the cells are equiprobable in continuous distributions. NCAT should not be negative in discrete distributions. The user must be careful to define cutpoints in discrete distributions since no error message can be generated in this situation if NCAT is negative.
RNGE — Vector of length 2 containing the lower and upper endpoints of the range of the distribution, respectively. (Input)
If the lower and upper endpoints are equal, a range on the whole real line is used. If the lower and upper endpoints are different, points outside of the range are ignored so that distributions conditional on the range can be used. In this case, the point RNGE(1) is excluded from the first interval, but the point RNGE(2) is included in the last interval.
NDFEST — Number of parameters estimated in computing the CDF. (Input)
CUTP — Vector of length ∣NCAT∣ ‑ 1 containing the cutpoints defining the cells. (Input, if NCAT is positive, output, otherwise)
∣NCAT∣ ‑ 1 cutpoints define the cells to be used. If NCAT is positive, then the cutpoints are input by the user. The intervals defined by the cutpoints are such that the lower endpoint is not included while the upper endpoint is included in the interval.
P — p‑value for the chi‑squared statistic in CHISQ(∣NCAT∣ + 1). (Output)
This chi‑squared statistic has DF degrees of freedom.
Optional Arguments
IDO — Processing option. (Input)
Default: IDO = 0.
IDO |
Action |
0 |
This is the only call to CHIGF, and all of the data are input on this call. |
1 |
This is the first call to CHIGF, and additional calls to CHIGF will be made. Initialization and updating for the data in X are performed. |
2 |
This is an intermediate call to CHIGF. Updating for the data in X is performed. |
3 |
This is the final call to CHIGF. Updating for the data in X and wrap‑up computations are performed. |
Calls to CHIGF with IDO = 2 or 3 may be intermixed. It is permissible for a call with IDO = 2 to follow a call with IDO = 3.
FRQ — Vector containing the frequencies. (Input)
If the first element of FRQ is ‑1.0, then all frequencies are taken to be 1 and FRQ is of length 1. Otherwise, FRQ is of length ∣NELM∣, and the elements in FRQ contain the frequency of the corresponding observation in X. If the frequency is missing (NaN, not a number) (and FRQ(1) is not ‑1.0), the observation is ignored.
Default: FRQ(1) = -1.0.
COUNTS — Vector of length ∣NCAT∣ containing the counts in each of the cells. (Output, if IDO = 0 or 1; input/output, if IDO > 1)
EXPECT — Vector of length ∣NCAT∣ containing the expected count in each cell. (Output, if IDO = 0 or 3; not referenced otherwise)
CHISQ — Vector of length ∣NCAT∣ + 1 containing the contributions to chi‑squared. (Output, if IDO = 0 or 3, not referenced otherwise)
Elements 1 through ∣NCAT∣ contain the contributions to chi‑squared for the corresponding cell. Element ∣NCAT∣ + 1 contains the total chi‑squared statistic.
DF — Degrees of freedom in chi‑squared. (Output)
FORTRAN 90 Interface
Generic: CALL CHIGF (CDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P [, …])
Specific: The specific interface names are S_CHIGF and D_CHIGF.
FORTRAN 77 Interface
Single: CALL CHIGF (IDO, CDF, NELM, X, FRQ, NCAT, RNGE, NDFEST, CUTP, COUNTS, EXPECT, CHISQ, P, DF)
Double: The double precision name is DCHIGF.
Description
Routine CHIGF performs a chi‑squared goodness‑of‑fit test that a random sample of observations is distributed according to a specified theoretical cumulative distribution. The theoretical distribution, which may be continuous, discrete, or a mixture of discrete and continuous distributions, is specified via a user‑defined FUNCTION. Because the user is allowed to specify a range for the observations, a test that is conditional upon the specified range is performed.
∣NCAT∣ gives the number of intervals into which the observations are to be divided. These intervals can be specified via the vector CUTP, which contains the cutpoints (or endpoints) for the intervals. Or if NCAT is negative, equiprobable intervals computed by CHIGF can be used. Regardless of the method used to obtain them, the intervals are such that the lower endpoint is not included in the interval while the upper endpoint is always included. The user should determine the cutpoints when the cumulative distribution function has discrete elements since CHIGF cannot determine them in this case. Regardless of how the cutpoints are determined, the lower endpoint of the first interval is specified by RNGE(1) when RNGE(1) ≠ RNGE(2) and is given as minus machine infinity otherwise. The upper endpoint of the last interval is defined similarly.
Routine CHIGF tallies the observations in X as follows. If the cutpoints are determined by CHIGF, then the cumulative probability at xi, F(xi), is computed via function CDF. The tally for xi is made in interval number ⌊mF (x) + 1⌋, where m = ∣NCAT∣ and ⌊ ⌋ is the function that takes the greatest integer that is no larger than the argument of the function. If the cutpoints are specified by the user, the tally is made in the interval to which xi belongs using the endpoints specified by the user. Thus, if the computer time required to calculate the cumulative distribution function is large, user‑specified cutpoints may be preferred in order to reduce the total computing time.
If the expected count in any cell is less than 1, then a rule of thumb is that the chi‑squared approximation may be suspect. A warning message to this effect is issued in this case, as well as when an expected value is less than 5.
Programming Notes
The user must supply a function CDF with calling sequence CDF(Y), which returns the value of the cumulative distribution function at any point Y in the range of the distribution. The supplied function must be declared in an EXTERNAL statement in the calling program. Many of the IMSL cumulative distribution functions in Chapter 17, “Probability Distribution Functions and Inverses” can be used for CDF, either directly, if the calling sequence is correct, or indirectly, if, for example, the sample means and standard deviations are to be used in computing the theoretical CDF.
Comments
Informational errors
Type |
Code |
Description |
4 |
4 |
There are more observations deleted from a cell than added. |
4 |
5 |
All observations are missing. |
3 |
6 |
An expected value is less than 1. |
3 |
7 |
An expected value is less than 5. |
4 |
8 |
The function CDF is not a cumulative distribution function. |
4 |
9 |
The probability of the range of the distribution is not positive. |
4 |
10 |
An error has occurred when inverting the cumulative distribution function. This function must be continuous and defined over the whole real line. If all else fails, you must specify the cutpoints (i.e., NCAT must be positive). |
Examples
Example 1
In this example, a discrete binomial random sample of size 1000 with binomial parameter p = 0.3 and binomial sample size 5 is generated via routine RNBIN (see Chapter 18, “Random Number Generation”). Routine RNSET is first used to set the seed. One call to CHIGF is made. Routine BINDF (see Chapter 17, “Probability Distribution Functions and Inverses”) is used to compute the CDF.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER ISEED, NCAT, NDFEST, NELM
PARAMETER (ISEED=123457, NCAT=6, NDFEST=0, NELM=1000)
!
INTEGER I, IX(NELM), NOUT
REAL CDF, CHISQ(NCAT+1), COUNTS(NCAT), CUTP(NCAT-1), DF, &
EXPECT(NCAT), P, RNGE(2), X(NELM)
EXTERNAL CDF
!
DATA RNGE/0.0, 0.0/
DATA CUTP/.5, 1.5, 2.5, 3.5, 4.5/
!
CALL RNSET (ISEED)
! Generate the data
CALL RNBIN (5, 0.3, IX)
DO 10 I=1, NELM
X(I) = IX(I)
10 CONTINUE
!
CALL CHIGF (CDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P, &
COUNTS=COUNTS, EXPECT=EXPECT, CHISQ=CHISQ, DF=DF)
! Print results
CALL WRRRN ('Counts', COUNTS, 1, NCAT, 1)
CALL WRRRN ('Expect', EXPECT, 1, NCAT, 1)
CALL WRRRN ('Contributions to Chi-squared', CHISQ, 1, NCAT, 1)
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) CHISQ(NCAT+1), P, DF
99999 FORMAT (///'0Chi-squared ', F8.4, /, ' P-value ' &
, F8.4, /, ' Degrees of freedom', F8.4)
END
!
REAL FUNCTION CDF (Y)
REAL Y
!
INTEGER I
REAL BINDF
EXTERNAL BINDF
!
I = Y
CDF = BINDF(I,5,0.3)
RETURN
END
Output
*** WARNING ERROR 7 from CHIGF. An expected value is less than 5.
Counts
1 2 3 4 5 6
170.0 331.0 320.0 148.0 28.0 3.0
Expect
1 2 3 4 5 6
168.1 360.2 308.7 132.3 28.3 2.4
Contributions to Chi-squared
1 2 3 4 5 6
0.022 2.359 0.414 1.863 0.004 0.134
Chi-squared 4.7963
P-value 0.4412
Degrees of freedom 5.0000
Example 2
This example illustrates the use of CHIGF on a randomly generated sample from the normal distribution. One thousand randomly generated observations are tallied into 10 equiprobable intervals. Twelve calls to CHIGF are made. The first call is solely for initialization since IDO = 1 and NROW = 0. The next 10 calls tally the data, 100 observations at a time, with IDO = 2 and NROW = 100. The last call is for wrap up only since IDO = 3 and NROW = 0. All twelve calls could have been replaced with one call to CHIGF with IDO = 0 and NROW = 1000. X would need to be of length 1000 if one call were used. In this example, the null hypothesis is not rejected.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER ISEED, NCAT, NDFEST
PARAMETER (ISEED=123457, NCAT=-10, NDFEST=0)
!
INTEGER I, IDO, NOUT, NELM
REAL CHISQ(-NCAT+1), COUNTS(-NCAT), CUTP(-NCAT-1), &
DF, EXPECT(-NCAT), P, RNGE(2), X(100)
!
DATA RNGE/0.0, 0.0/
!
CALL RNSET (ISEED)
! Initialization
IDO = 1
NELM = 0
CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P,&
IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT,&
CHISQ=CHISQ, DF=DF)
! Add the data
IDO = 2
NELM = 100
DO 10 I=1, 10
CALL RNNOR (X)
CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P, &
IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT, &
CHISQ=CHISQ, DF=DF)
10 CONTINUE
! Wrap up
IDO = 3
NELM = 0
CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, &
P, IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT, &
CHISQ=CHISQ, DF=DF)
! Print results
CALL WRRRN ('Cutpoints', CUTP, 1, -NCAT, 1)
CALL WRRRN ('Counts', COUNTS, 1, -NCAT, 1)
CALL WRRRN ('Expect', EXPECT, 1, -NCAT, 1)
CALL WRRRN ('Contributions to Chi-squared', CHISQ, 1, -NCAT, 1)
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) CHISQ(-NCAT+1), P, DF
99999 FORMAT (///'0Chi-squared ', F8.4, /, ' P-value ' &
, F8.4, /, ' Degrees of freedom', F8.4)
END
Output
Cutpoints
1 2 3 4 5 6 7 8 9
-1.282 -0.842 -0.524 -0.253 0.000 0.253 0.524 0.842 1.282
Counts
1 2 3 4 5 6 7 8 9 10
106.0 109.0 89.0 92.0 83.0 87.0 110.0 104.0 121.0 99.0
Expect
1 2 3 4 5 6 7 8 9 10
100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Contributions to Chi-squared
1 2 3 4 5 6 7 8 9 10
0.360 0.810 1.210 0.640 2.890 1.690 1.000 0.160 4.410 0.010
Chi-squared 13.1806
P-value 0.1546
Degrees of freedom 9.0000