CNL Stat : Multivariate Analysis : cluster_number
cluster_number
Computes cluster membership for a hierarchical cluster tree.
Synopsis
#include <imsls.h>
int *imsls_cluster_number (int npt, int iclson[], int icrson[], int k, 0)
Required Arguments
int npt (Input)
Number of data points to be clustered.
int iclson[] (Input)
An array of length npt - 1 containing the left son cluster numbers.
Cluster npt + i is formed by merging clusters iclson[i-1] and icrson[i-1].
int icrson[] (Input)
An array of length npt - 1 containing the right son cluster numbers.
Cluster npt + i is formed by merging clusters iclson[i-1] and icrson[i-1].
int k (Input)
Desired number of clusters.
Return Value
An array of length npt containing the cluster membership of each observation.
Synopsis with Optional Arguments
#include <imsls.h>
int *imsls_cluster_number (int npt, int iclson[], int  icrson[], int k,
IMSLS_OBS_PER_CLUSTER, int **nclus,
IMSLS_OBS_PER_CLUSTER_USER, int nclus[],
IMSLS_RETURN_USER, int iclus[],
0)
Optional Arguments
IMSLS_OBS_PER_CLUSTER, int **nclus (Output)
Address of a pointer to an internally allocated array of length k containing the number of observations in each cluster.
IMSLS_OBS_PER_CLUSTER_USER, int nclus[] (Output)
Storage for array nclus is provided by the user. See IMSLS_OBS_PER_CLUSTER.
IMSLS_RETURN_USER, float iclus[] (Output)
User allocated array of length npt containing the cluster membership of each observation.
Description
Given a fixed number of clusters (K) and the cluster tree (vectors icrson and iclson) produced by the hierarchical clustering algorithm (see function imsls_f_cluster_hierarchical, function imsls_cluster_number determines the cluster membership of each observation. The function imsls_cluster_number first determines the root nodes for the K distinct subtrees forming the K clusters and then traverses each subtree to determine the cluster membership of each observation. The function imsls_cluster_number also returns the number of observations found in each cluster.
Examples
Example 1
In the following example, cluster membership for K = 2 clusters is found for the displayed cluster tree. The output vector iclus contains the cluster numbers for each observation.
 
#include <imsls.h>
 
int main()
{
int k = 2, npt = 5, *iclus;
int iclson[] = {5, 6, 4, 7};
int icrson[] = {3, 1, 2, 8};
 
iclus = imsls_cluster_number(npt, iclson, icrson, k, 0);
imsls_i_write_matrix("iclus", 1, 5, iclus, 0);
}
Output
 
iclus
1 2 3 4 5
1 2 1 2 1
Example 2
This example illustrates the typical usage of imsls_cluster_number. The Fisher Iris data (see function imsls_f_data_sets, Utilities.) is clustered. First the distance between the irises is computed using function imsls_f_dissimilarities. The resulting distance matrix is then clustered using function imsls_f_cluster_hierarchical. The cluster membership for 5 clusters is then obtained via function imsls_cluster_number using the output from imsls_f_cluster_hierarchical. The need for 5 clusters can be obtained either by theoretical means or by examining a cluster tree. The cluster membership for each of the iris observations is printed.
 
#include <imsls.h>
#include <stdlib.h>
 
#define MAX(A,B) ((A)>(B)?(A): (B))
 
int main()
{
int ncol = 5, nrow = 150, nvar = 4, npt = 150, k = 5;
int i, j, *iclson, *icrson, *iclus, *nclus;
int ind[4] = {1, 2, 3, 4};
float *clevel, dist[150][150], *x, f_rand;
int *p_iclus = NULL, *p_nclus = NULL;
 
x = imsls_f_data_sets (3,
0);
 
imsls_f_dissimilarities(nrow, ncol, x,
IMSLS_INDEX, nvar, ind,
IMSLS_RETURN_USER, dist,
0);
 
imsls_random_seed_set (4);
 
for (i = 0; i < npt; i++)
{
for (j = i + 1; j < npt; j++)
{
 
imsls_f_random_uniform (1,
IMSLS_RETURN_USER, &f_rand,
0);
 
dist[i][j] = MAX (0.0, dist[i][j] + .001 * f_rand);
dist[j][i] = dist[i][j];
}
dist[i][i] = 0.;
}
 
imsls_f_cluster_hierarchical (npt, (float*)dist,
IMSLS_CLUSTERS, &clevel, &iclson, &icrson,
0);
 
iclus = imsls_cluster_number (npt, iclson, icrson, k,
IMSLS_OBS_PER_CLUSTER, &nclus,
0);
 
imsls_i_write_matrix ("iclus", 25, 5, iclus,
0);
imsls_i_write_matrix ("nclus", 1, 5, nclus,
0);
}
Output
 
iclus
1 2 3 4 5
1 5 5 5 5 5
2 5 5 5 5 5
3 5 5 5 5 5
4 5 5 5 5 5
5 5 5 5 5 5
6 5 5 5 5 5
7 5 5 5 5 5
8 5 5 5 5 5
9 5 5 5 5 5
10 5 5 5 5 5
11 2 2 2 2 2
12 2 2 1 2 2
13 1 2 2 2 2
14 2 2 2 2 2
15 2 2 2 2 2
16 2 2 2 2 2
17 2 2 2 2 2
18 2 2 2 2 2
19 2 2 2 1 2
20 2 2 2 1 2
21 2 2 2 2 2
22 2 3 2 2 2
23 2 2 2 2 2
24 2 2 4 2 2
25 2 2 2 2 2
 
nclus
1 2 3 4 5
4 93 1 2 50