CNL Stat : Data Mining : unsupervised_nominal_filter
unsupervised_nominal_filter
Converts nominal data into a series of binary encoded columns for input to a neural network. Optionally, it can also reverse the binary encoding, accepting a series of binary encoded columns and returning a single column of nominal classes.
Synopsis
#include <imsls.h>
int *imsls_unsupervised_nominal_filter (int n_patterns, int n_classes, int x[], …, 0)
Required Arguments
int n_patterns (Input)
Number of observations.
int *n_classes (Input/Output)
A pointer to the number of classes in x[]. n_classes is output for IMSLS_ENCODE and input for IMSLS_DECODE.
int x[] (Input)
A one or two-dimensional array depending upon whether encoding or decoding is requested. If encoding is requested, x is an array of length n_patterns containing the categories for a nominal variable numbered from 1 to n_classes. If decoding is requested, then x is an array of size n_patterns by n_classes. In this case, the columns contain only zeros and ones that are interpreted as binary encoded representations for a single nominal variable.
Return Value
A pointer to an internally allocated array, z[]. The values in z are either the encoded or decoded values for x, depending upon whether IMSLS_ENCODE or IMSLS_DECODE is requested. If errors are encountered, NULL is returned.
Synopsis with Optional Arguments
#include <imsls.h>
int *imsls_f_unsupervised_nominal_filter (int n_patterns, int n_classes, int x[],
IMSLS_ENCODE, or
IMSLS_DECODE,
IMSLS_RETURN_USER, int z[],
0)
Optional Arguments
IMSLS_ENCODE, (Input)
Specifies binary encoding. Classes must be numbered sequentially from 1 to n_classes. Optional Arguments IMSLS_ENCODE and IMSLS_DECODE are mutually exclusive.
Default: IMSLS_ENCODE.
or
IMSLS_DECODE, (Input)
Specifies that x will be decoded. The values in each column should be zeros and ones. The values in the i-th column of x are associated with the i-th class of the nominal variable. Optional Arguments IMSLS_ENCODE and IMSLS_DECODE are mutually exclusive.
Default: IMSLS_ENCODE.
IMSLS_RETURN_USER, int z[] (Output)
A user-supplied array of size n_patterns by n_classes. If IMSLS_DECODE is specified, then z should be length n_patterns. The value in z[i] is either the encoded or decoded value for x[i], depending upon whether IMSLS_ENCODE or IMSLS_DECODE is specified.
Description
The function imsls_unsupervised_nominal_filter is designed to either encode or decode nominal variables using a simple binary mapping.
Binary Encoding: IMSLS_ENCODE
In this case, x[] is an input array to which a binary filter is applied. Binary encoding takes each category in x[], and creates a column in z[], the output matrix, containing all zeros and ones. A value of zero indicates that this category is not present and a value of one indicates that it is present.
For example, if x[]={2,1,3,4,2,4} then n_classes=4, and
Notice that the number of columns in z is equal to the number of distinct classes in x. The number of rows in z is equal to the length of x.
Binary Decoding: IMSLS_DECODE
Binary decoding takes each column in x[], and returns the appropriate class in z[].
For example, if x[] is the same as described above:
then z[] would be returned as z[]={2, 1, 3, 4, 2, 4}. Notice this is the same as the original array because classes are numbered sequentially from 1 to n_classes. This ensures that the i-th column of x[] is associated with the i-th class in the output array.
Example
This example illustrates nominal binary encoding and decoding for x = {3, 3, 1, 2, 2, 1, 2}.
 
#include <imsls.h>
#include <stdio.h>
 
int main ()
{
#define N_PATTERNS 7
 
int x[N_PATTERNS] = {3, 3, 1, 2, 2, 1, 2};
int *x2;
int *z, n_classes;
 
/* Binary Filtering. */
z = imsls_unsupervised_nominal_filter(N_PATTERNS, &n_classes, x,
0);
 
printf("n_classes = %d\n",n_classes);
 
imsls_i_write_matrix("X", N_PATTERNS, 1, (int*)x,
0);
imsls_i_write_matrix("Z", N_PATTERNS, n_classes, z,
0);
 
/* Binary Unfiltering. */
x2 = imsls_unsupervised_nominal_filter(N_PATTERNS, &n_classes, z,
IMSLS_DECODE,
0);
 
imsls_i_write_matrix("Unfiltering result", N_PATTERNS, 1, x2,
0);
}
Output
7 n_classes = 3
8
9 X
10 1 3
11 2 3
12 3 1
13 4 2
14 5 2
15 6 1
16 7 2
17
18 Z
19 1 2 3
20 1 0 0 1
21 2 0 0 1
22 3 1 0 0
23 4 0 1 0
24 5 0 1 0
25 6 1 0 0
26 7 0 1 0
27
28 Unfiltering result
29 1 3
30 2 3
31 3 1
32 4 2
33 5 2
34 6 1
35 7 2