unsupervised_nominal_filter

Converts nominal data into a series of binary encoded columns for input to a neural network. Optionally, it can also reverse the binary encoding, accepting a series of binary encoded columns and returning a single column of nominal classes.

Synopsis

#include <imsls.h>

int *imsls_unsupervised_nominal_filter (int n_patterns, int n_classes, int x[], …, 0)

Required Arguments

int n_patterns (Input)
Number of observations.

int *n_classes (Input/Output)
A pointer to the number of classes in x[]. n_classes is output for IMSLS_ENCODE and input for IMSLS_DECODE.

int x[] (Input)
A one or two-dimensional array depending upon whether encoding or decoding is requested. If encoding is requested, x is an array of length n_patterns containing the categories for a nominal variable numbered from 1 to n_classes. If decoding is requested, then x is an array of size n_patterns by n_classes. In this case, the columns contain only zeros and ones that are interpreted as binary encoded representations for a single nominal variable.

Return Value

A pointer to an internally allocated array, z[]. The values in z are either the encoded or decoded values for x, depending upon whether IMSLS_ENCODE or IMSLS_DECODE is requested. If errors are encountered, NULL is returned.

Synopsis with Optional Arguments

#include <imsls.h>

int *imsls_f_unsupervised_nominal_filter (int n_patterns, int n_classes, int x[],

IMSLS_ENCODE, or

IMSLS_DECODE,

IMSLS_RETURN_USER, int z[],

0)

Optional Arguments

IMSLS_ENCODE, (Input)
Specifies binary encoding. Classes must be numbered sequentially from 1 to n_classes. Optional Arguments IMSLS_ENCODE and IMSLS_DECODE are mutually exclusive.

Default: IMSLS_ENCODE.

or

IMSLS_DECODE, (Input)
Specifies that x will be decoded. The values in each column should be zeros and ones. The values in the i-th column of x are associated with the i-th class of the nominal variable. Optional Arguments IMSLS_ENCODE and IMSLS_DECODE are mutually exclusive.

Default: IMSLS_ENCODE.

IMSLS_RETURN_USER, int z[] (Output)
A user-supplied array of size n_patterns by n_classes. If IMSLS_DECODE is specified, then z should be length n_patterns. The value in z[i] is either the encoded or decoded value for x[i], depending upon whether IMSLS_ENCODE or IMSLS_DECODE is specified.

Description

The function imsls_unsupervised_nominal_filter is designed to either encode or decode nominal variables using a simple binary mapping.

Binary Encoding: IMSLS_ENCODE

In this case, x[] is an input array to which a binary filter is applied. Binary encoding takes each category in x[], and creates a column in z[], the output matrix, containing all zeros and ones. A value of zero indicates that this category is not present and a value of one indicates that it is present.

For example, if x[]={2,1,3,4,2,4} then n_classes=4, and

 

Notice that the number of columns in z is equal to the number of distinct classes in x. The number of rows in z is equal to the length of x.

Binary Decoding: IMSLS_DECODE

Binary decoding takes each column in x[], and returns the appropriate class in z[].

For example, if x[] is the same as described above:

 

then z[] would be returned as z[]={2, 1, 3, 4, 2, 4}. Notice this is the same as the original array because classes are numbered sequentially from 1 to n_classes. This ensures that the i-th column of x[] is associated with the i-th class in the output array.

Example

This example illustrates nominal binary encoding and decoding for x = {3, 3, 1, 2, 2, 1, 2}.

 

#include <imsls.h>

#include <stdio.h>

 

int main ()

{

#define N_PATTERNS 7

 

int x[N_PATTERNS] = {3, 3, 1, 2, 2, 1, 2};

int *x2;

int *z, n_classes;

 

/* Binary Filtering. */

z = imsls_unsupervised_nominal_filter(N_PATTERNS, &n_classes, x,

0);

 

printf("n_classes = %d\n",n_classes);

 

imsls_i_write_matrix("X", N_PATTERNS, 1, (int*)x,

0);

imsls_i_write_matrix("Z", N_PATTERNS, n_classes, z,

0);

 

/* Binary Unfiltering. */

x2 = imsls_unsupervised_nominal_filter(N_PATTERNS, &n_classes, z,

IMSLS_DECODE,

0);

 

imsls_i_write_matrix("Unfiltering result", N_PATTERNS, 1, x2,

0);

}

Output

7 n_classes = 3

8

9 X

10 1 3

11 2 3

12 3 1

13 4 2

14 5 2

15 6 1

16 7 2

17

18 Z

19 1 2 3

20 1 0 0 1

21 2 0 0 1

22 3 1 0 0

23 4 0 1 0

24 5 0 1 0

25 6 1 0 0

26 7 0 1 0

27

28 Unfiltering result

29 1 3

30 2 3

31 3 1

32 4 2

33 5 2

34 6 1

35 7 2