Performs a chi-squared goodness-of-fit test.
#include <imsls.h>
float imsls_f_chi_squared_test (float user_proc_cdf(), int n_observations, int n_categories, float x[], ..., 0)
The type double function is imsls_d_chi_squared_test.
float user_proc_cdf
(float
y) (Input)
User-supplied function that returns the
hypothesized, cumulative distribution function at the point y.
int
n_observations (Input)
Number of data elements input in
x.
int
n_categories (Input)
Number of cells into which the
observations are to be tallied.
float x[]
(Input)
Array with n_observations
components containing the vector of data elements for this test.
The p-value for the goodness-of-fit chi-squared statistic.
#include <imsls.h>
float
imsls_f_chi_squared_test (float
user_proc_cdf(),
int n_observations,
int
n_categories,
float
x[],
IMSLS_N_PARAMETERS_ESTIMATED,
int
n_parameters,
IMSLS_IDO,
int
ido,
IMSLS_CUTPOINTS,
float
**cutpoints,
IMSLS_CUTPOINTS_USER,
float
cutpoints[],
IMSLS_CUTPOINTS_EQUAL,
IMSLS_CHI_SQUARED,
float
*chi_squared,
IMSLS_DEGREES_OF_FREEDOM,
float
*df,
IMSLS_FREQUENCIES,
float
frequencies[],
IMSLS_BOUNDS,
float
lower_bound,
float
upper_bound,
IMSLS_CELL_COUNTS,
float
**cell_counts,
IMSLS_CELL_COUNTS_USER,
float
cell_counts[],
IMSLS_CELL_EXPECTED,
float
**cell_expected,
IMSLS_CELL_EXPECTED_USER,
float
cell_expected[],
IMSLS_CELL_CHI_SQUARED,
float
**cell_chi_squared,
IMSLS_CELL_CHI_SQUARED_USER,
float
cell_chi_squared[],
IMSLS_FCN_W_DATA, float
fcn(),
void
*data,
0)
IMSLS_N_PARAMETERS_ESTIMATED, int
n_parameters (Input)
Number of parameters estimated in
computing the cumulative distribution function.
IMSLS_IDO, int ido
(Input)
Processing option.
The argument ido must be one of 0, 1, 2, or 3. If ido = 0 (the default), all of the observations are input during one invocation. If ido = 1, 2, or 3, blocks of rows of the data can be processed sequentially in separate invocations of imsls_f_chi_squared_test; with this option, it is not a requirement that all observations be memory resident, thus enabling one to handle large data sets.
ido |
Action |
0 |
This is the only invocation; all the data are input at once. (Default) |
1 |
This is the first invocation with this data; additional calls will be made. Initialization and updating for the n_observations observations of x will be performed. |
2 |
This is an intermediate invocation; updating for the n_observations observations of x will be performed. |
3 |
This is the final invocation of this function. Updating for the data in x and wrap-up computations are performed. Workspace is released. No further invocations of imsls_f_chi_squared_test with ido greater than 1 should be made without first invoking imsls_f_chi_squared_test with ido = 1. |
Default: ido = 0
IMSLS_CUTPOINTS, float
**cutpoints (Output)
Address of a pointer to an internally
allocated array of length n_categories − 1
containing the vector of cutpoints defining the cell intervals. The intervals
defined by the cutpoints are such that the lower endpoint is not included and
the upper endpoint is included in any interval. If IMSLS_CUTPOINTS_EQUAL
is specified, equal probability cutpoints are computed and returned in cutpoints.
IMSLS_CUTPOINTS_USER, float cutpoints []
(Input/Output)
Storage for array cutpoints is provided
by the user. See IMSLS_CUTPOINTS.
IMSLS_CUTPOINTS_EQUAL
If
IMSLS_CUTPOINTS_USER
is specified, then equal probability cutpoints can still be used if, in
addition, the IMSLS_CUTPOINTS_EQUAL
option is specified. If IMSLS_CUTPOINTS_USER
is not specified, equal probability cutpoints are used by default.
IMSLS_CHI_SQUARED, float
*chi_squared (Output)
If specified, the chi-squared test
statistic is returned in *chi_squared.
IMSLS_DEGREES_OF_FREEDOM, float *df
(Output)
If specified, the degrees of freedom for the chi-squared
goodness-of-fit test is returned in *df.
IMSLS_FREQUENCIES, float
frequencies[] (Input)
Array with n_observations
components containing the vector frequencies for the observations stored in
x.
IMSLS_BOUNDS, float lower_bound, float
upper_bound (Input)
If IMSLS_BOUNDS is
specified, then lower_bound is the
lower bound of the range of the distribution and upper_bound is the
upper bound of this range. If lower_bound = upper_bound, a range
on the whole real line is used (the default). If the lower and upper endpoints
are different, points outside the range of these bounds are ignored.
Distributions conditional on a range can be specified when IMSLS_BOUNDS is used.
By convention, lower_bound is
excluded from the first interval, but upper_bound is
included in the last interval.
IMSLS_CELL_COUNTS, float
**cell_counts (Output)
Address of a pointer to an
internally allocated array of length n_categories
containing the cell counts. The cell counts are the observed frequencies in each
of the n_categories cells.
IMSLS_CELL_COUNTS_USER, float
cell_counts[] (Output)
Storage for array cell_counts is
provided by the user. See IMSLS_CELL_COUNTS.
IMSLS_CELL_EXPECTED, float
**cell_expected (Output)
Address of a pointer to an
internally allocated array of length n_categories
containing the cell expected values. The expected value of a cell is the
expected count in the cell given that the hypothesized distribution is
correct.
IMSLS_CELL_EXPECTED_USER, float
cell_expected[] (Output)
Storage for array cell_expected is
provided by the user. See IMSLS_CELL_EXPECTED.
IMSLS_CELL_CHI_SQUARED, float
**cell_chi_squared (Output)
Address of a pointer to an
internally allocated array of length n_categories
containing the cell contributions to chi-squared.
IMSLS_CELL_CHI_SQUARED_USER, float
cell_chi_squared[] (Output)
Storage for array cell_chi_squared is
provided by the user. See IMSLS_CELL_CHI_SQUARED.
IMSLS_FCN_W_DATA, float user_proc_cdf
(float
y), void
*data, (Input)
User-supplied function that returns the hypothesized,
cumulative distribution function, which also accepts a pointer to data that is
supplied by the user. data is a pointer to
the data to be passed to the user-supplied function. See the Introduction, Passing Data to
User-Supplied Functions at the beginning of this manual for more
details.
Function imsls_f_chi_squared_test performs a chi-squared goodness-of-fit test that a random sample of observations is distributed according to a specified theoretical cumulative distribution. The theoretical distribution, which can be continuous, discrete, or a mixture of discrete and continuous distributions, is specified by the user-defined function user_proc_cdf. Because the user is allowed to give a range for the observations, a test that is conditional on the specified range is performed.
Argument n_categories gives the number of intervals into which the observations are to be divided. By default, equiprobable intervals are computed by imsls_f_chi_squared_test, but intervals that are not equiprobable can be specified through the use of optional argument IMSLS_CUTPOINTS.
Regardless of the method used to obtain the cutpoints, the intervals are such that the lower endpoint is not included in the interval, while the upper endpoint is always included. If the cumulative distribution function has discrete elements, then user-provided cutpoints should always be used since imsls_f_chi_squared_test cannot determine the discrete elements in discrete distributions.
By default, the lower and upper endpoints of the first and last intervals are −∞ and +∞, respectively. If IMSLS_BOUNDS is specified, the endpoints are user-defined by the two arguments lower_bound and upper_bound.
A tally of counts is maintained for the observations in x as follows:
• If the cutpoints are specified by the user, the tally is made in the interval to which xi belongs, using the user-specified endpoints.
• If the cutpoints are determined by imsls_f_chi_squared_test, then the cumulative probability at xi, F(xi), is computed by the function user_proc_cdf.
The tally for xi is made in interval number ⌊mF(xi) + 1⌋, where m = n_categories and ⌊·⌋ is the function that takes the greatest integer that is no larger than the argument of the function. Thus, if the computer time required to calculate the cumulative distribution function is large, user-specified cutpoints may be preferred to reduce the total computing time.
If the expected count in any cell is less than 1, then the chi-squared approximation may be suspect. A warning message to this effect is issued in this case, as well as when an expected value is less than 5.
Function user_proc_cdf must be supplied with calling sequence user_proc_cdf(y), which returns the value of the cumulative distribution function at any point y in the (optionally) specified range. Many of the cumulative distribution functions in Chapter 11, “Probability Distribution Functions and Inverses”, can be used for user_proc_cdf, either directly if the calling sequence is correct or indirectly if, for example, the sample means and standard deviations are to be used in computing the theoretical cumulative distribution function.
This example illustrates the use of imsls_f_chi_squared_test on a randomly generated sample from the normal distribution. One-thousand randomly generated observations are tallied into 10 equiprobable intervals. The null hypothesis, that the sample is from a normal distribution, is specified by use of imsls_f_normal_cdf (Chapter 11, Probability Distribution Functions and Inverses) as the hypothesized distribution function. In this example, the null hypothesis is not rejected.
#include <imsls.h>
#define SEED 123457
#define N_CATEGORIES 10
#define N_OBSERVATIONS 1000
int main()
{
float *x, p_value;
imsls_random_seed_set(SEED);
/* Generate Normal deviates */
x = imsls_f_random_normal (N_OBSERVATIONS, 0);
/* Perform chi squared test */
p_value = imsls_f_chi_squared_test (imsls_f_normal_cdf,
N_OBSERVATIONS,
N_CATEGORIES, x, 0);
/* Print results */
printf ("p-value = %7.4f\n", p_value);
}
p-value = 0.1546
In this example, optional arguments are used for the data in the initial example.
#include <imsls.h>
#define SEED 123457
#define N_CATEGORIES 10
#define N_OBSERVATIONS 1000
int main()
{
float *cell_counts, *cutpoints, *cell_chi_squared;
float chi_squared_statistics[3], *x;
char *stat_row_labels[] = {"chi-squared",
"degrees of freedom","p-value"};
imsls_random_seed_set(SEED);
/* Generate normal deviates */
x = imsls_f_random_normal (N_OBSERVATIONS, 0);
/* Perform chi squared test */
chi_squared_statistics[2] =
imsls_f_chi_squared_test (imsls_f_normal_cdf,
N_OBSERVATIONS, N_CATEGORIES, x,
IMSLS_CUTPOINTS, &cutpoints,
IMSLS_CELL_COUNTS, &cell_counts,
IMSLS_CELL_CHI_SQUARED, &cell_chi_squared,
IMSLS_CHI_SQUARED, &chi_squared_statistics[0],
IMSLS_DEGREES_OF_FREEDOM, &chi_squared_statistics[1],
0);
/* Print results */
imsls_f_write_matrix ("\nChi Squared Statistics\n", 3, 1,
chi_squared_statistics,
IMSLS_ROW_LABELS, stat_row_labels,
0);
imsls_f_write_matrix ("Cut Points", 1, N_CATEGORIES-1,
cutpoints, 0);
imsls_f_write_matrix ("Cell Counts", 1, N_CATEGORIES,
cell_counts, 0);
imsls_f_write_matrix ("Cell Contributions to Chi-Squared", 1,
N_CATEGORIES, cell_chi_squared,
0);
}
Chi Squared Statistics
chi-squared 13.18
degrees of freedom 9.00
p-value 0.15
Cut Points
1 2 3 4 5 6
-1.282 -0.842 -0.524 -0.253 -0.000 0.253
7 8 9
0.524 0.842 1.282
Cell Counts
1 2 3 4 5 6
106 109 89 92 83 87
7 8 9 10
110 104 121 99
Cell Contributions to Chi-Squared
1 2 3 4 5 6
0.36 0.81 1.21 0.64 2.89 1.69
7 8 9 10
1.00 0.16 4.41 0.01
In this example, a discrete Poisson random sample of size 1,000 with parameter θ = 5.0 is generated by function imsls_f_random_poisson (Chapter 12, Random Number Generation”;). In the call to imsls_f_chi_squared_test, function imsls_f_poisson_cdf (Chapter 11, “Probability Distribution Functions and Inverses”;) is used as function user_proc_cdf.
#include <imsls.h>
#define SEED 123457
#define N_CATEGORIES 10
#define N_PARAMETERS_ESTIMATED 0
#define N_NUMBERS 1000
#define THETA 5.0
float user_proc_cdf(float);
int main()
{
int i, *poisson;
float cell_statistics[3][N_CATEGORIES];
float chi_squared_statistics[3], x[N_NUMBERS];
float cutpoints[] = {1.5, 2.5, 3.5, 4.5, 5.5, 6.5,
7.5, 8.5, 9.5};
char *cell_row_labels[] = {"count", "expected count",
"cell chi-squared"};
char *cell_col_labels[] = {"Poisson value", "0", "1", "2",
"3", "4", "5", "6", "7",
"8", "9"};
char *stat_row_labels[] = {"chi-squared",
"degrees of freedom","p-value"};
imsls_random_seed_set(SEED);
/* Generate the data */
poisson = imsls_random_poisson(N_NUMBERS, THETA, 0);
/* Copy data to a floating point vector*/
for (i = 0; i < N_NUMBERS; i++)
x[i] = poisson[i];
chi_squared_statistics[2] =
imsls_f_chi_squared_test(user_proc_cdf, N_NUMBERS,
N_CATEGORIES, x,
IMSLS_CUTPOINTS_USER, cutpoints,
IMSLS_CELL_COUNTS_USER, &cell_statistics[0][0],
IMSLS_CELL_EXPECTED_USER, &cell_statistics[1][0],
IMSLS_CELL_CHI_SQUARED_USER, &cell_statistics[2][0],
IMSLS_CHI_SQUARED, &chi_squared_statistics[0],
IMSLS_DEGREES_OF_FREEDOM, &chi_squared_statistics[1],
0);
/* Print results */
imsls_f_write_matrix("\nChi-squared Statistics\n", 3, 1,
&chi_squared_statistics[0],
IMSLS_ROW_LABELS, stat_row_labels,
0);
imsls_f_write_matrix("\nCell Statistics\n", 3, N_CATEGORIES,
&cell_statistics[0][0],
IMSLS_ROW_LABELS, cell_row_labels,
IMSLS_COL_LABELS, cell_col_labels,
IMSLS_WRITE_FORMAT, "%9.1f",
0);
}
float user_proc_cdf(float k)
{
float cdf_v;
cdf_v = imsls_f_poisson_cdf ((int) k, THETA);
return cdf_v;
}
Chi-squared Statistics
chi-squared 10.48
degrees of freedom 9.00
p-value 0.31
Cell Statistics
Poisson value 0 1 2 3 4
count 41.0 94.0 138.0 158.0 150.0
expected count 40.4 84.2 140.4 175.5 175.5
cell chi-squared 0.0 1.1 0.0 1.7 3.7
Poisson value 5 6 7 8 9
count 159.0 116.0 75.0 37.0 32.0
expected count 146.2 104.4 65.3 36.3 31.8
cell chi-squared 1.1 1.3 1.4 0.0 0.0
Continuing with Example 1 data, the example below invokes the imsls_f_chi_squared_test function using values of ido greater than 0. Also, optional arguments are used for the data.
#include <imsls.h>
#define SEED 123457
#define N_CATEGORIES 10
#define N_OBSERVATIONS 1000
#define N_OBSERVATIONS_BLOCK_1 300
#define N_OBSERVATIONS_BLOCK_2 300
#define N_OBSERVATIONS_BLOCK_3 400
int main()
{
float *cell_counts, *cutpoints, *cell_chi_squared;
float chi_squared_statistics[3], *x;
char *stat_row_labels[] = {"chi-squared",
"degrees of freedom","p-value"};
float lv_x_block_1[N_OBSERVATIONS_BLOCK_1];
float lv_x_block_2[N_OBSERVATIONS_BLOCK_2];
float lv_x_block_3[N_OBSERVATIONS_BLOCK_3];
int i;
imsls_random_seed_set(SEED);
/* Generate normal deviates */
x = imsls_f_random_normal (N_OBSERVATIONS, 0);
for(i=0; i<N_OBSERVATIONS_BLOCK_1; i++)
lv_x_block_1[i]=x[i];
for(i=0; i<N_OBSERVATIONS_BLOCK_2; i++)
lv_x_block_2[i]=x[N_OBSERVATIONS_BLOCK_1+i];
for(i=0; i<N_OBSERVATIONS_BLOCK_3; i++)
lv_x_block_3[i]=x[N_OBSERVATIONS_BLOCK_1+N_OBSERVATIONS_BLOCK_2+i];
/* Perform chi squared test */
chi_squared_statistics[2] =
imsls_f_chi_squared_test
(imsls_f_normal_cdf,
N_OBSERVATIONS_BLOCK_1, N_CATEGORIES, lv_x_block_1,
IMSLS_IDO, 1,
IMSLS_CUTPOINTS, &cutpoints,
IMSLS_CHI_SQUARED, &chi_squared_statistics[0],
IMSLS_DEGREES_OF_FREEDOM, &chi_squared_statistics[1],
IMSLS_CELL_COUNTS, &cell_counts,
IMSLS_CELL_CHI_SQUARED, &cell_chi_squared,
0);
if (cutpoints) imsls_free (cutpoints);
if (cell_counts) imsls_free (cell_counts);
if (cell_chi_squared) imsls_free (cell_chi_squared);
chi_squared_statistics[2] =
imsls_f_chi_squared_test
(imsls_f_normal_cdf,
N_OBSERVATIONS_BLOCK_2, N_CATEGORIES, lv_x_block_2,
IMSLS_IDO, 2,
IMSLS_CUTPOINTS, &cutpoints,
IMSLS_CHI_SQUARED, &chi_squared_statistics[0],
IMSLS_DEGREES_OF_FREEDOM, &chi_squared_statistics[1],
IMSLS_CELL_COUNTS, &cell_counts,
IMSLS_CELL_CHI_SQUARED, &cell_chi_squared,
0);
if (cutpoints) imsls_free (cutpoints);
if (cell_counts) imsls_free (cell_counts);
if (cell_chi_squared) imsls_free (cell_chi_squared);
chi_squared_statistics[2] =
imsls_f_chi_squared_test
(imsls_f_normal_cdf,
N_OBSERVATIONS_BLOCK_3, N_CATEGORIES, lv_x_block_3,
IMSLS_IDO, 3,
IMSLS_CUTPOINTS, &cutpoints,
IMSLS_CHI_SQUARED, &chi_squared_statistics[0],
IMSLS_DEGREES_OF_FREEDOM, &chi_squared_statistics[1],
IMSLS_CELL_COUNTS, &cell_counts,
IMSLS_CELL_CHI_SQUARED, &cell_chi_squared,
0);
/* Print results */
imsls_f_write_matrix ("\nChi Squared Statistics\n", 3, 1,
chi_squared_statistics,
IMSLS_ROW_LABELS, stat_row_labels,
0);
imsls_f_write_matrix ("Cut Points", 1, N_CATEGORIES-1,
cutpoints, 0);
imsls_f_write_matrix ("Cell Counts", 1, N_CATEGORIES,
cell_counts, 0);
imsls_f_write_matrix ("Cell Contributions to Chi-Squared", 1,
N_CATEGORIES, cell_chi_squared,
0);
if (cutpoints) imsls_free (cutpoints);
if (cell_counts) imsls_free (cell_counts);
if (cell_chi_squared) imsls_free (cell_chi_squared);
}
Chi Squared Statistics
chi-squared 13.18
degrees of freedom 9.00
p-value 0.15
Cut Points
1 2 3 4 5 6
-1.282 -0.842 -0.524 -0.253 -0.000 0.253
7 8 9
0.524 0.842 1.282
Cell Counts
1 2 3 4 5 6
106 109 89 92 83 87
7 8 9 10
110 104 121 99
Cell Contributions to Chi-Squared
1 2 3 4 5 6
0.36 0.81 1.21 0.64 2.89 1.69
7 8 9 10
1.00 0.16 4.41 0.01
IMSLS_EXPECTED_VAL_LESS_THAN_1 |
An expected value is less than 1. |
IMSLS_EXPECTED_VAL_LESS_THAN_5 |
An expected value is less than 5. |
IMSLS_X_VALUE_OUT_OF_RANGE |
Row x contains a value which is out of range. |
IMSLS_MISSING_DATA_ELEMENT |
At least one data element is missing. |
IMSLS_ALL_OBSERVATIONS_MISSING |
All observations contain missing values. |
IMSLS_INCORRECT_CDF_1 |
Function user_proc_cdf is not a cumulative distribution function. The value at the lower bound must be nonnegative, and the value at the upper bound must not be greater than 1. |
IMSLS_INCORRECT_CDF_2 |
Function user_proc_cdf is not a cumulative distribution function. The probability of the range of the distribution is not positive. |
IMSLS_INCORRECT_CDF_3 |
Function user_proc_cdf is not a cumulative distribution function. Its evaluation at an element in x is inconsistent with either the evaluation at the lower or upper bound. |
IMSLS_INCORRECT_CDF_4 |
Function user_proc_cdf is not a cumulative distribution function. Its evaluation at a cutpoint is inconsistent with either the evaluation at the lower or upper bound. |
IMSLS_INCORRECT_CDF_5 |
An error has occurred when inverting the cumulative distribution function. This function must be continuous and defined over the whole real line. |
IMSLS_TOO_MANY_CELL_DELETIONS |
There are more observations deleted from the cell than added. |
IMSLS_NO_BOUND_AFTER_100_TRYS |
After 100 attempts, a bound for the inverse cannot be determined. Try again with a different initial estimate. |
IMSLS_NO_UNIQUE_INVERSE_EXISTS |
No unique inverse exists. |
IMSLS_CONVERGENCE_ASSUMED |
Over 100 iterations have occurred without convergence. Convergence is assumed. |
IMSLS_BAD_IDO_6 |
“ido” = #. Initial allocations must be performed by invoking the function with “ido” = 1. |
IMSLS_BAD_IDO_7 |
“ido” = #. A new analysis may not begin until the previous analysis is terminated by invoking the function with “ido” = 3. |
IMSLS_BAD_N_CATEGORIES |
“n_categories” = #. The number of categories variable, “n_categories”, must be the same in separate function calls. |