CHIGF

Performs a chi‑squared goodness‑of‑fit test.

Required Arguments

CDF — User‑supplied FUNCTION to compute the cumulative distribution function (CDF) at a given value. The form is CDF(Y), where

Y – Value at which the CDF is to be evaluated. (Input)
CDF – Value of the CDF at Y. (Output)

CDF must be declared EXTERNAL in the calling program.

NELM — The absolute value of NELM is the number of data elements currently input in X. (Input)
NELM may be positive, zero, or negative. Negative NELM means delete the ‑NELM data elements from the analysis.

X — Vector of length NELM containing the data elements for this call. (Input)
If the data element is missing (NaN, not a number), then the observation is ignored.

NCAT — The absolute value of NCAT is the number of cells into which the observations are to be tallied. (Input)
If NCAT is negative, then CHIGF chooses the cutpoints in CUTP so that the cells are equiprobable in continuous distributions. NCAT should not be negative in discrete distributions. The user must be careful to define cutpoints in discrete distributions since no error message can be generated in this situation if NCAT is negative.

RNGE — Vector of length 2 containing the lower and upper endpoints of the range of the distribution, respectively. (Input)
If the lower and upper endpoints are equal, a range on the whole real line is used. If the lower and upper endpoints are different, points outside of the range are ignored so that distributions conditional on the range can be used. In this case, the point RNGE(1) is excluded from the first interval, but the point RNGE(2) is included in the last interval.

NDFEST — Number of parameters estimated in computing the CDF. (Input)

CUTP — Vector of length NCAT  1 containing the cutpoints defining the cells. (Input, if NCAT is positive, output, otherwise)
NCAT  1 cutpoints define the cells to be used. If NCAT is positive, then the cutpoints are input by the user. The intervals defined by the cutpoints are such that the lower endpoint is not included while the upper endpoint is included in the interval.

Pp‑value for the chi‑squared statistic in CHISQ(NCAT + 1). (Output)
This chi‑squared statistic has DF degrees of freedom.

Optional Arguments

IDO — Processing option. (Input)
Default: IDO = 0.

 

IDO

Action

0

This is the only call to CHIGF, and all of the data are input on this call.

1

This is the first call to CHIGF, and additional calls to CHIGF will be made. Initialization and updating for the data in X are performed.

2

This is an intermediate call to CHIGF. Updating for the data in X is performed.

3

This is the final call to CHIGF. Updating for the data in X and wrap‑up computations are performed.

Calls to CHIGF with IDO = 2 or 3 may be intermixed. It is permissible for a call with IDO = 2 to follow a call with IDO = 3.

FRQ — Vector containing the frequencies. (Input)
If the first element of FRQ is 1.0, then all frequencies are taken to be 1 and FRQ is of length 1. Otherwise, FRQ is of length NELM, and the elements in FRQ contain the frequency of the corresponding observation in X. If the frequency is missing (NaN, not a number) (and FRQ(1) is not 1.0), the observation is ignored.
Default: FRQ(1) = -1.0.

COUNTS — Vector of length NCAT containing the counts in each of the cells. (Output, if IDO = 0 or 1; input/output, if IDO > 1)

EXPECT — Vector of length NCAT containing the expected count in each cell. (Output, if IDO = 0 or 3; not referenced otherwise)

CHISQ — Vector of length NCAT + 1 containing the contributions to chi‑squared. (Output, if IDO = 0 or 3, not referenced otherwise)
Elements 1 through NCAT contain the contributions to chi‑squared for the corresponding cell. Element NCAT + 1 contains the total chi‑squared statistic.

DF — Degrees of freedom in chi‑squared. (Output)

FORTRAN 90 Interface

Generic: CALL CHIGF (CDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P [])

Specific: The specific interface names are S_CHIGF and D_CHIGF.

FORTRAN 77 Interface

Single: CALL CHIGF (IDO, CDF, NELM, X, FRQ, NCAT, RNGE, NDFEST, CUTP, COUNTS, EXPECT, CHISQ, P, DF)

Double: The double precision name is DCHIGF.

Description

Routine CHIGF performs a chi‑squared goodness‑of‑fit test that a random sample of observations is distributed according to a specified theoretical cumulative distribution. The theoretical distribution, which may be continuous, discrete, or a mixture of discrete and continuous distributions, is specified via a user‑defined FUNCTION. Because the user is allowed to specify a range for the observations, a test that is conditional upon the specified range is performed.

NCAT gives the number of intervals into which the observations are to be divided. These intervals can be specified via the vector CUTP, which contains the cutpoints (or endpoints) for the intervals. Or if NCAT is negative, equiprobable intervals computed by CHIGF can be used. Regardless of the method used to obtain them, the intervals are such that the lower endpoint is not included in the interval while the upper endpoint is always included. The user should determine the cutpoints when the cumulative distribution function has discrete elements since CHIGF cannot determine them in this case. Regardless of how the cutpoints are determined, the lower endpoint of the first interval is specified by RNGE(1) when RNGE(1)  RNGE(2) and is given as minus machine infinity otherwise. The upper endpoint of the last interval is defined similarly.

Routine CHIGF tallies the observations in X as follows. If the cutpoints are determined by CHIGF, then the cumulative probability at xiF(xi), is computed via function CDF. The tally for xi is made in interval number mF (x) + 1, where m = NCAT and   is the function that takes the greatest integer that is no larger than the argument of the function. If the cutpoints are specified by the user, the tally is made in the interval to which xi belongs using the endpoints specified by the user. Thus, if the computer time required to calculate the cumulative distribution function is large, user‑specified cutpoints may be preferred in order to reduce the total computing time.

If the expected count in any cell is less than 1, then a rule of thumb is that the chi‑squared approximation may be suspect. A warning message to this effect is issued in this case, as well as when an expected value is less than 5.

Programming Notes

The user must supply a function CDF with calling sequence CDF(Y), which returns the value of the cumulative distribution function at any point Y in the range of the distribution. The supplied function must be declared in an EXTERNAL statement in the calling program. Many of the IMSL cumulative distribution functions in Chapter 17, “Probability Distribution Functions and Inverses” can be used for CDF, either directly, if the calling sequence is correct, or indirectly, if, for example, the sample means and standard deviations are to be used in computing the theoretical CDF.

Comments

Informational errors

 

Type

Code

Description

4

4

There are more observations deleted from a cell than added.

4

5

All observations are missing.

3

6

An expected value is less than 1.

3

7

An expected value is less than 5.

4

8

The function CDF is not a cumulative distribution function.

4

9

The probability of the range of the distribution is not positive.

4

10

An error has occurred when inverting the cumulative distribution function. This function must be continuous and defined over the whole real line. If all else fails, you must specify the cutpoints (i.e., NCAT must be positive).

Examples

Example 1

In this example, a discrete binomial random sample of size 1000 with binomial parameter p = 0.3 and binomial sample size 5 is generated via routine RNBIN (see Chapter 18, “Random Number Generation”). Routine RNSET is first used to set the seed. One call to CHIGF is made. Routine BINDF (see Chapter 17, “Probability Distribution Functions and Inverses”) is used to compute the CDF.

 

USE IMSL_LIBRARIES

 

IMPLICIT NONE

INTEGER ISEED, NCAT, NDFEST, NELM

PARAMETER (ISEED=123457, NCAT=6, NDFEST=0, NELM=1000)

!

INTEGER I, IX(NELM), NOUT

REAL CDF, CHISQ(NCAT+1), COUNTS(NCAT), CUTP(NCAT-1), DF, &

EXPECT(NCAT), P, RNGE(2), X(NELM)

EXTERNAL CDF

!

DATA RNGE/0.0, 0.0/

DATA CUTP/.5, 1.5, 2.5, 3.5, 4.5/

!

CALL RNSET (ISEED)

! Generate the data

CALL RNBIN (5, 0.3, IX)

DO 10 I=1, NELM

X(I) = IX(I)

10 CONTINUE

!

CALL CHIGF (CDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P, &

COUNTS=COUNTS, EXPECT=EXPECT, CHISQ=CHISQ, DF=DF)

! Print results

CALL WRRRN ('Counts', COUNTS, 1, NCAT, 1)

CALL WRRRN ('Expect', EXPECT, 1, NCAT, 1)

CALL WRRRN ('Contributions to Chi-squared', CHISQ, 1, NCAT, 1)

CALL UMACH (2, NOUT)

WRITE (NOUT,99999) CHISQ(NCAT+1), P, DF

99999 FORMAT (///'0Chi-squared ', F8.4, /, ' P-value ' &

, F8.4, /, ' Degrees of freedom', F8.4)

END

!

REAL FUNCTION CDF (Y)

REAL Y

!

INTEGER I

REAL BINDF

EXTERNAL BINDF

!

I = Y

CDF = BINDF(I,5,0.3)

RETURN

END

Output

 

*** WARNING ERROR 7 from CHIGF. An expected value is less than 5.

 

Counts

1 2 3 4 5 6

170.0 331.0 320.0 148.0 28.0 3.0

 

Expect

1 2 3 4 5 6

168.1 360.2 308.7 132.3 28.3 2.4

 

Contributions to Chi-squared

1 2 3 4 5 6

0.022 2.359 0.414 1.863 0.004 0.134

 

Chi-squared 4.7963

P-value 0.4412

Degrees of freedom 5.0000

Example 2

This example illustrates the use of CHIGF on a randomly generated sample from the normal distribution. One thousand randomly generated observations are tallied into 10 equiprobable intervals. Twelve calls to CHIGF are made. The first call is solely for initialization since IDO = 1 and NROW = 0. The next 10 calls tally the data, 100 observations at a time, with IDO = 2 and NROW = 100. The last call is for wrap up only since IDO = 3 and NROW = 0. All twelve calls could have been replaced with one call to CHIGF with IDO = 0 and NROW = 1000. X would need to be of length 1000 if one call were used. In this example, the null hypothesis is not rejected.

 

USE IMSL_LIBRARIES

 

IMPLICIT NONE

INTEGER ISEED, NCAT, NDFEST

PARAMETER (ISEED=123457, NCAT=-10, NDFEST=0)

!

INTEGER I, IDO, NOUT, NELM

REAL CHISQ(-NCAT+1), COUNTS(-NCAT), CUTP(-NCAT-1), &

DF, EXPECT(-NCAT), P, RNGE(2), X(100)

!

DATA RNGE/0.0, 0.0/

!

CALL RNSET (ISEED)

! Initialization

IDO = 1

NELM = 0

CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P,&

IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT,&

CHISQ=CHISQ, DF=DF)

! Add the data

IDO = 2

NELM = 100

DO 10 I=1, 10

CALL RNNOR (X)

CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P, &

IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT, &

CHISQ=CHISQ, DF=DF)

10 CONTINUE

! Wrap up

IDO = 3

NELM = 0

CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, &

P, IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT, &

CHISQ=CHISQ, DF=DF)

! Print results

CALL WRRRN ('Cutpoints', CUTP, 1, -NCAT, 1)

CALL WRRRN ('Counts', COUNTS, 1, -NCAT, 1)

CALL WRRRN ('Expect', EXPECT, 1, -NCAT, 1)

CALL WRRRN ('Contributions to Chi-squared', CHISQ, 1, -NCAT, 1)

CALL UMACH (2, NOUT)

WRITE (NOUT,99999) CHISQ(-NCAT+1), P, DF

99999 FORMAT (///'0Chi-squared ', F8.4, /, ' P-value ' &

, F8.4, /, ' Degrees of freedom', F8.4)

END

Output

 

Cutpoints

1 2 3 4 5 6 7 8 9

-1.282 -0.842 -0.524 -0.253 0.000 0.253 0.524 0.842 1.282

 

Counts

1 2 3 4 5 6 7 8 9 10

106.0 109.0 89.0 92.0 83.0 87.0 110.0 104.0 121.0 99.0

 

Expect

1 2 3 4 5 6 7 8 9 10

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

 

Contributions to Chi-squared

1 2 3 4 5 6 7 8 9 10

0.360 0.810 1.210 0.640 2.890 1.690 1.000 0.160 4.410 0.010

 

Chi-squared 13.1806

P-value 0.1546

Degrees of freedom 9.0000