cluster_number

Computes cluster membership for a hierarchical cluster tree.

Synopsis

#include <imsls.h>

int *imsls_cluster_number (int npt, int iclson[], int icrson[], int k, 0)

Required Arguments

int npt (Input)
Number of data points to be clustered.

int iclson[] (Input)
An array of length npt - 1 containing the left son cluster numbers.

Cluster npt + i is formed by merging clusters iclson[i-1] and icrson[i-1].

int icrson[] (Input)
An array of length npt - 1 containing the right son cluster numbers.

Cluster npt + i is formed by merging clusters iclson[i-1] and icrson[i-1].

int k (Input)
Desired number of clusters.

Return Value

An array of length npt containing the cluster membership of each observation.

Synopsis with Optional Arguments

#include <imsls.h>

int *imsls_cluster_number (int npt, int iclson[], int  icrson[], int k,

IMSLS_OBS_PER_CLUSTER, int **nclus,

IMSLS_OBS_PER_CLUSTER_USER, int nclus[],

IMSLS_RETURN_USER, int iclus[],

0)

Optional Arguments

IMSLS_OBS_PER_CLUSTER, int **nclus (Output)
Address of a pointer to an internally allocated array of length k containing the number of observations in each cluster.

IMSLS_OBS_PER_CLUSTER_USER, int nclus[] (Output)
Storage for array nclus is provided by the user. See IMSLS_OBS_PER_CLUSTER.

IMSLS_RETURN_USER, float iclus[] (Output)
User allocated array of length npt containing the cluster membership of each observation.

Description

Given a fixed number of clusters (K) and the cluster tree (vectors icrson and iclson) produced by the hierarchical clustering algorithm (see function imsls_f_cluster_hierarchical, function imsls_cluster_number determines the cluster membership of each observation. The function imsls_cluster_number first determines the root nodes for the K distinct subtrees forming the K clusters and then traverses each subtree to determine the cluster membership of each observation. The function imsls_cluster_number also returns the number of observations found in each cluster.

Examples

Example 1

In the following example, cluster membership for K = 2 clusters is found for the displayed cluster tree. The output vector iclus contains the cluster numbers for each observation.

 

 

 

#include <imsls.h>

 

int main()

{

int k = 2, npt = 5, *iclus;

int iclson[] = {5, 6, 4, 7};

int icrson[] = {3, 1, 2, 8};

 

iclus = imsls_cluster_number(npt, iclson, icrson, k, 0);

imsls_i_write_matrix("iclus", 1, 5, iclus, 0);

}

Output

 

iclus

1 2 3 4 5

1 2 1 2 1

Example 2

This example illustrates the typical usage of imsls_cluster_number. The Fisher Iris data (see function imsls_f_data_sets, Utilities.) is clustered. First the distance between the irises is computed using function imsls_f_dissimilarities. The resulting distance matrix is then clustered using function imsls_f_cluster_hierarchical. The cluster membership for 5 clusters is then obtained via function imsls_cluster_number using the output from imsls_f_cluster_hierarchical. The need for 5 clusters can be obtained either by theoretical means or by examining a cluster tree. The cluster membership for each of the iris observations is printed.

 

#include <imsls.h>

#include <stdlib.h>

 

#define MAX(A,B) ((A)>(B)?(A): (B))

 

int main()

{

int ncol = 5, nrow = 150, nvar = 4, npt = 150, k = 5;

int i, j, *iclson, *icrson, *iclus, *nclus;

int ind[4] = {1, 2, 3, 4};

float *clevel, dist[150][150], *x, f_rand;

int *p_iclus = NULL, *p_nclus = NULL;

 

x = imsls_f_data_sets (3,

0);

 

imsls_f_dissimilarities(nrow, ncol, x,

IMSLS_INDEX, nvar, ind,

IMSLS_RETURN_USER, dist,

0);

 

imsls_random_seed_set (4);

 

for (i = 0; i < npt; i++)

{

for (j = i + 1; j < npt; j++)

{

 

imsls_f_random_uniform (1,

IMSLS_RETURN_USER, &f_rand,

0);

 

dist[i][j] = MAX (0.0, dist[i][j] + .001 * f_rand);

dist[j][i] = dist[i][j];

}

dist[i][i] = 0.;

}

 

imsls_f_cluster_hierarchical (npt, (float*)dist,

IMSLS_CLUSTERS, &clevel, &iclson, &icrson,

0);

 

iclus = imsls_cluster_number (npt, iclson, icrson, k,

IMSLS_OBS_PER_CLUSTER, &nclus,

0);

 

imsls_i_write_matrix ("iclus", 25, 5, iclus,

0);

imsls_i_write_matrix ("nclus", 1, 5, nclus,

0);

}

Output

 

iclus

1 2 3 4 5

1 5 5 5 5 5

2 5 5 5 5 5

3 5 5 5 5 5

4 5 5 5 5 5

5 5 5 5 5 5

6 5 5 5 5 5

7 5 5 5 5 5

8 5 5 5 5 5

9 5 5 5 5 5

10 5 5 5 5 5

11 2 2 2 2 2

12 2 2 1 2 2

13 1 2 2 2 2

14 2 2 2 2 2

15 2 2 2 2 2

16 2 2 2 2 2

17 2 2 2 2 2

18 2 2 2 2 2

19 2 2 2 1 2

20 2 2 2 1 2

21 2 2 2 2 2

22 2 3 2 2 2

23 2 2 2 2 2

24 2 2 4 2 2

25 2 2 2 2 2

 

nclus

1 2 3 4 5

4 93 1 2 50