Sorts observations by specified keys, with option to tally cases into a multi-way frequency table.
#include <imsls.h>
void imsls_f_sort_data (int n_observations, int n_variables, float x[], int n_keys, ..., 0)
The type double function is imsls_d_sort_data.
int
n_observations (Input)
Number of observations (rows) in
x.
int
n_variables (Input)
Number of variables (columns) in x.
float x[]
(Input/Output)
An n_observations × n_variables matrix
containing the observations to be sorted. The sorted matrix is returned in x (exception: see
optional argument IMSLS_PASSIVE).
int n_keys
(Input)
Number of columns of x on which to sort.
The first n_keys
columns of x are
used as the sorting keys (exception: see optional argument IMSLS_INDICES_KEYS).
#include <imsls.h>
void imsls_f_sort_data (int n_observations, int n_variables, float x[], int n_keys,
IMSLS_X_COL_DIM, int x_col_dim,
IMSLS_INDICES_KEYS, int indices_keys[],
IMSLS_FREQUENCIES, float frequencies[],
IMSLS_ASCENDING, or
IMSLS_DESCENDING,
IMSLS_ACTIVE, or
IMSLS_PASSIVE,
IMSLS_PERMUTATION, int **permutation,
IMSLS_PERMUTATION_USER, int permutation[],
IMSLS_TABLE, int **n_values, float **values, float **table,
IMSLS_TABLE_USER, int n_values[], float values[], float table[],
IMSLS_LIST_CELLS, int *n_cells, float **list_cells, float **table_unbalanced,
IMSLS_LIST_CELLS_USER, int *n_cells, float list_cells[], float table_unbalanced[],
IMSLS_N, int *n_cells, int **n,
IMSLS_N_USER, int *n_cells, int n[],
0)
IMSLS_X_COL_DIM, int x_col_dim
(Input)
Column dimension of x.
Default: x_col_dim = n_variables
IMSLS_INDICES_KEYS, int
indices_keys[] (Input)
Array of length n_keys giving the
column numbers of x which are to be used
in the sort.
Default: indices_keys [ ] = 0, 1, …, n_keys − 1
IMSLS_FREQUENCIES, float
frequencies[] (Input)
Array of length n_observations
containing the frequency for each observation in x.
Default: frequencies [ ] = 1
IMSLS_ASCENDING, or
IMSLS_DESCENDING
By default, or if IMSLS_ASCENDING is
specified, the sort is in ascending order. If IMSLS_DESCENDING is
specified, the sort is in descending order.
IMSLS_ACTIVE, or
IMSLS_PASSIVE
By
default, or if IMSLS_ACTIVE is
specified, the sorted matrix is returned in x. If IMSLS_PASSIVE is
specified, x is
unchanged by imsls_f_sort_data
(i.e., x becomes
input only).
IMSLS_PERMUTATION, int
**permutation (Output)
Address of a pointer to an
internally allocated array of length n_observations
specifying the rearrangement (permutation) of the observations (rows).
IMSLS_PERMUTATION_USER, int
permutation[] (Output)
Storage for array permutation is
provided by the user. See IMSLS_PERMUTATION.
IMSLS_TABLE, int **n_values, float **values, float **table
(Output)
Argument n_values is the
address of a pointer to an internally allocated array of length n_keys containing in
its i-th element (i = 0, 1, …, n_keys − 1), the
number of levels or categories of the i-th classification variable
(column).
Argument values is the address of a pointer to an internally allocated array of length n_values [0] + n_values [1] + … + n_values [n_keys − 1] containing the values of the classification variables. The first n_values [0] elements of values contain the values for the first classification variable. The next n_values [1] contain the values for the second variable. The last n_values [n_keys − 1] positions contain the values for the last classification variable.
Argument table is the address of a pointer to an internally allocated array of length n_values [0] × n_values [1] × … × n_values [n_keys − 1] containing the frequencies in the cells of the table to be fit.
Empty cells are included in table, and each element of table is nonnegative. The cells of table are sequenced so that the first variable cycles through its n_values [0] categories one time, the second variable cycles through its n_values [1] categories n_values [0] times, the third variable cycles through its n_values [2] categories n_values [0] × n_values [1] times, etc., up to the n_keys-th variable, which cycles through its n_values [n_keys − 1] categories n_values [0] × n_values [1] × … × n_values [n_keys − 2] times.
IMSLS_TABLE_USER, int n_values[], float values[], float table[]
(Output)
Storage for arrays n_values, values, and table is provided by
the user. If the length of table is not known in
advance, the upper bound for this length can be taken to be the product of the
number of distinct values taken by all of the classification variables (since
table includes
the empty cells).
IMSLS_LIST_CELLS, int *n_cells, float **list_cells, float **table_unbalanced
(Output)
Number of nonempty cells is returned by n_cells. Argument
list_cells is an
internally allocated array of size n_cells × n_keys containing, for
each row, a list of the levels of n_keys corresponding
classification variables that describe a cell.
Argument table_unbalanced is the address of a pointer to an array of length n_cells containing the frequency for each cell.
IMSLS_LIST_CELLS_USER, int *n_cells, float list_cells[],
float table_unbalanced[]
(Output)
Storage for arrays list_cells and table_unbalanced is
provided by the user. See IMSLS_LIST_CELLS.
IMSLS_N, int *n_cells, int **n
(Output)
The integer n_cells returns the
number of groups of different observations. A group contains observations (rows)
in x that are
equal with respect to the method of comparison.
Argument n is the address of the pointer to an internally allocated array of length n_cells containing the number of observations (rows) in each group.
The first n [0] rows of the sorted x are group number 1. The next n [1]rows of the sorted x are group number 2, etc. The last n [n_cells − 1] rows of the sorted x are group number n_cells.
IMSLS_N_USER, int *n_cells, int n[]
(Output)
Storage for array n_cells is provided by
the user. If the value of n_cells is not known,
n_observations
can be used as an upper bound for the length of n. See IMSLS_N.
Function imsls_f_sort_data can perform both a key sort and/or tabulation of frequencies into a multi-way frequency table.
Function imsls_f_sort_data sorts the rows of real matrix x using a particular row in x as the keys. The sort is algebraic with the first key as the most significant, the second key as the next most significant, etc. When x is sorted in ascending order, the resulting sorted array is such that the following is true:
• For i = 0, 1, …, n_observations − 2, x [i] [indices_keys [0]] ≤ x [i + 1] [indices_keys [0]]
• For k = 1, …, n_keys − 1, if x [i] [indices_keys [j]] = x [i + 1] [indices_keys [j]] for j = 0, 1, …, k − 1, then x [i] [indices_keys [k]] = x [i + 1] [indices_keys [k]]
The observations also can be sorted in descending order.
The rows of x containing the missing value code NaN in at least one of the specified columns are considered as an additional group. These rows are moved to the end of the sorted x.
The sorting algorithm is based on a quicksort method given by Singleton (1969) with modifications by Griffen and Redish (1970) and Petro (1970).
Function imsls_f_sort_data determines the distinct values in multivariate data and computes frequencies for the data. This function accepts the data in the matrix x, but performs computations only for the variables (columns) in the first n_keys columns of x (Exception: see optional argument IMSLS_INDICES_KEYS). In general, the variables for which frequencies should be computed are discrete; they should take on a relatively small number of different values. Variables that are continuous can be grouped first. The imsls_f_table_oneway function can be used to group variables and determine the frequencies of groups.
When IMSLS_TABLE is specified, imsls_f_sort_data fills the vector values with the unique values of the variables and tallies the number of unique values of each variable in the vector table. Each combination of one value from each variable forms a cell in a multi-way table. The frequencies of these cells are entered in table so that the first variable cycles through its values exactly once, and the last variable cycles through its values most rapidly. Some cells cannot correspond to any observations in the data; in other words, “missing cells” are included in table and have a value of 0.
When IMSLS_LIST_CELLS is specified, the frequency of each cell is entered in table_unbalanced so that the first variable cycles through its values exactly once and the last variable cycles through its values most rapidly. All cells have a frequency of at least 1, i.e., there is no “missing cell.” The array list_cells can be considered “parallel” to table_unbalanced because row i of list_cells is the set of n_keys values that describes the cell for which row i of table_unbalanced contains the corresponding frequency.
The rows of a 10 × 3 matrix x are sorted in ascending order using Columns 0 and 1 as the keys. There are two missing values (NaNs) in the keys. The observations containing these values are moved to the end of the sorted array.
#include <imsls.h>
#define N_OBSERVATIONS 10
#define N_VARIABLES 3
int main()
{
int n_keys=2;
float x[N_OBSERVATIONS][N_VARIABLES] = {1.0, 1.0, 1.0,
2.0, 1.0, 2.0,
1.0, 1.0, 3.0,
1.0, 1.0, 4.0,
2.0, 2.0, 5.0,
1.0, 2.0, 6.0,
1.0, 2.0, 7.0,
1.0, 1.0, 8.0,
2.0, 2.0, 9.0,
1.0, 1.0, 9.0};
x[4][1]=imsls_f_machine(6);
x[6][0]=imsls_f_machine(6);
imsls_f_sort_data (N_OBSERVATIONS, N_VARIABLES, x, n_keys, 0);
imsls_f_write_matrix("sorted x", N_OBSERVATIONS, N_VARIABLES,
(float *)x, 0);
}
sorted x
1 2 3
1 1 1 1
2 1 1 9
3 1 1 3
4 1 1 4
5 1 1 8
6 1 2 6
7 2 1 2
8 2 2 9
9 ......... 2 7
10 2 ......... 5
This example uses the same data as the previous example. The permutation of the rows is output in the array permutation.
#include <imsls.h>
#define N_OBSERVATIONS 10
#define N_VARIABLES 3
int main()
{
int n_keys=2;
int n_cells;
int *n;
int *permutation;
float x[N_OBSERVATIONS][N_VARIABLES]={1.0, 1.0, 1.0,
2.0, 1.0, 2.0,
1.0, 1.0, 3.0,
1.0, 1.0, 4.0,
2.0, 2.0, 5.0,
1.0, 2.0, 6.0,
1.0, 2.0, 7.0,
1.0, 1.0, 8.0,
2.0. 2.0, 9.0,
1.0, 1.0, 9.0};
x[4][1]=imsls_f_machine(6);
x[6][0]=imsls_f_machine(6);
imsls_f_sort_data (N_OBSERVATIONS, N_VARIABLES,
(float *)x, n_keys,
IMSLS_PASSIVE,
IMSLS_PERMUTATION, &permutation,
IMSLS_N, &n_cells, &n, 0};
imsls_f_write_matrix("unchanged x ", N_OBSERVATIONS, N_VARIABLES,
(float *)x, 0);
imsls_i_write_matrix("permutation", 1, N_OBSERVATIONS, permutation,
0);
imsls_i_write_matrix("n", 1, n_cells, n, 0);
}
unchanged x
1 2 3
1 1 1 1
2 2 1 2
3 1 1 3
4 1 1 4
5 2 .......... 5
6 1 2 6
7 .......... 2 7
8 1 1 8
9 2 2 9
10 1 1 9
permutation
1 2 3 4 5 6 7 8 9 10
0 9 2 3 7 5 1 8 6 4
n
1 2 3 4
5 1 1 1
The table of frequencies for a data matrix of size 30 × 2 is output in the array table.
#include <imsls.h>
int main()
{
int n_observations=30;
int n_variables=2;
int n_keys=2;
int *n_values;
int n_rows, n_columns;
float *values;
float *table;
float x[] = {0.5, 1.5,
1.5, 3.5,
0.5, 3.5,
1.5, 2.5,
1.5, 3.5,
1.5, 4.5,
0.5, 1.5,
1.5, 3.5,
3.5, 6.5,
2.5, 3.5,
2.5, 4.5,
3.5, 6.5,
1.5, 2.5,
2.5, 4.5,
0.5, 3.5,
1.5, 2.5,
1.5, 3.5,
0.5, 3.5,
0.5, 1.5,
0.5, 2.5,
2.5, 5.5,
1.5, 2.5,
1.5, 3.5,
1.5, 4.5,
4.5, 5.5,
2.5, 4.5,
0.5, 3.5,
1.5, 2.5,
0.5, 2.5,
2.5, 5.5};
imsls_f_sort_data (n_observations, n_variables, x, n_keys,
IMSLS_PASSIVE,
IMSLS_TABLE, &n_values, &values, &table,
0);
imsls_f_write_matrix("unchanged x", n_observations, n_variables,
x, 0);
n_rows = n_values[0];
n_columns = n_values[1];s
imsls_f_write_matrix("row values", 1, n_rows, values, 0);
imsls_f_write_matrix("column values", 1, n_columns, &values[n_rows],
0);
imsls_f_write_matrix("table", n_rows, n_columns, table, 0);
}
unchanged x
1 2
1 0.5 1.5
2 1.5 3.5
3 0.5 3.5
4 1.5 2.5
5 1.5 3.5
6 1.5 4.5
7 0.5 1.5
8 1.5 3.5
9 3.5 6.5
10 2.5 3.5
11 2.5 4.5
12 3.5 6.5
13 1.5 2.5
14 2.5 4.5
15 0.5 3.5
16 1.5 2.5
17 1.5 3.5
18 0.5 3.5
19 0.5 1.5
20 0.5 2.5
21 2.5 5.5
22 1.5 2.5
23 1.5 3.5
24 1.5 4.5
25 4.5 5.5
26 2.5 4.5
27 0.5 3.5
28 1.5 2.5
29 0.5 2.5
30 2.5 5.5
row values
1 2 3 4 5
0.5 1.5 2.5 3.5 4.5
column values
1 2 3 4 5 6
1.5 2.5 3.5 4.5 5.5 6.5
Table
1 2 3 4 5 6
1 3 2 4 0 0 0
2 0 5 5 2 0 0
3 0 0 1 3 2 0
4 0 0 0 0 0 2
5 0 0 0 0 1 0