unsupervisedNominalFilter

Converts nominal data into a series of binary encoded columns for input to a neural network. Optionally, it can also reverse the binary encoding, accepting a series of binary encoded columns and returning a single column of nominal classes.

Synopsis

unsupervisedNominalFilter (nClasses, x)

Required Arguments

int nClasses (Input/Output)
The number of classes in x[]. nClasses is output for encode and input for decode.
int x[] (Input)
A one or two-dimensional array depending upon whether encoding or decoding is requested. If encoding is requested, x is an array of length nPatterns containing the categories for a nominal variable numbered from 1 to nClasses. If decoding is requested, then x is an array of size nPatterns by nClasses. In this case, the columns contain only zeros and ones that are interpreted as binary encoded representations for a single nominal variable.

Return Value

An array, z[]. The values in z are either the encoded or decoded values for x, depending upon whether encode or decode is requested. If errors are encountered, None is returned.

Optional Arguments

encode, (Input)

Specifies binary encoding. Classes must be numbered sequentially from 1 to nClasses. Optional Arguments encode and decode are mutually exclusive.

Default: encode.

or

decode, (Input)

Specifies that x will be decoded. The values in each column should be zeros and ones. The values in the i-th column of x are associated with the i-th class of the nominal variable. Optional Arguments encode and decode are mutually exclusive.

Default: encode.

Description

The function unsupervisedNominalFilter is designed to either encode or decode nominal variables using a simple binary mapping.

Binary Encoding: encode

In this case, x[] is an input array to which a binary filter is applied. Binary encoding takes each category in x[], and creates a column in z[], the output matrix, containing all zeros and ones. A value of zero indicates that this category is not present and a value of one indicates that it is present.

For example, if x[]={2,1,3,4,2,4} then nClasses=4, and

\[\begin{split}z = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}\end{split}\]

Notice that the number of columns in z is equal to the number of distinct classes in x. The number of rows in z is equal to the length of x.

Binary Decoding: decode

Binary decoding takes each column in x[], and returns the appropriate class in z[].

For example, if x[] is the same as described above:

\[\begin{split}x = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}\end{split}\]

then z[] would be returned as z[]={2, 1, 3, 4, 2, 4}. Notice this is the same as the original array because classes are numbered sequentially from 1 to nClasses. This ensures that the i-th column of x[] is associated with the i-th class in the output array.

Example

This example illustrates nominal binary encoding and decoding for \(x=\{3,3,1,2,2,1,2\}\).

from __future__ import print_function
from numpy import *
from pyimsl.stat.unsupervisedNominalFilter import unsupervisedNominalFilter
from pyimsl.stat.writeMatrix import writeMatrix

x = array([3, 3, 1, 2, 2, 1, 2])

# Binary filtering
nClasses = []  # Output for encode, input for decode
z = unsupervisedNominalFilter(nClasses, x, encode=True)

print("nClasses = ", nClasses[0])
writeMatrix("x", x, writeFormat="%5i", column=True)
writeMatrix("z", z, writeFormat="%5i")

# Binary Unfiltering.
nClasses = nClasses[0]  # Output for encode, input for decode
x2 = unsupervisedNominalFilter(nClasses, z, decode=True)
writeMatrix("Unfiltering result", x2, writeFormat="%5i", column=True)

Output

nClasses =  3
 
    x
1      3
2      3
3      1
4      2
5      2
6      1
7      2
 
           z
       1      2      3
1      0      0      1
2      0      0      1
3      1      0      0
4      0      1      0
5      0      1      0
6      1      0      0
7      0      1      0
 
Unfiltering result
     1      3
     2      3
     3      1
     4      2
     5      2
     6      1
     7      2