CNUMB

Computes cluster membership for a hierarchical cluster tree.

Required Arguments

NODE — Number of data points clustered. (Input)

ICLSON — Vector of length NODE  1 containing the left son cluster numbers. (Input)
Cluster NODE + I is formed by merging clusters ICLSON(I) and ICRSON(I).

ICRSON — Vector of length NODE  1 containing the right son cluster numbers. (Input)
Cluster NODE + I is formed by merging clusters ICLSON(I) and ICRSON(I).

K — Desired number of clusters. (Input)

ICLUS — Vector of length NODE containing the cluster membership of each observation. (Output)
Observation I is in cluster ICLUS(I) when K clusters are specified.

NCLUS — Vector of length K containing the number of observations in each cluster. (Output)

FORTRAN 90 Interface

Generic: CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

Specific: The specific interface name is CNUMB.

FORTRAN 77 Interface

Single: CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

Description

Given a fixed number of clusters (K) and the cluster tree (vectors ICRSON and ICLSON) produced by the hierarchical clustering algorithm (see routine CLINK), routine CNUMB determines the cluster membership of each observation. The routine CNUMB first determines the root nodes for the K distinct subtrees forming the K clusters and then traverses each subtree to determine the cluster membership of each observation. The routine CNUMB also returns the number of observations found in each cluster.

Comments

1. Workspace may be explicitly provided, if desired, by use of C2UMB. The reference is:

CALL C2UMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS, IPT)

The additional argument is:

IPT — Work vector of length 2 * NODE.

2. Informational errors

 

Type

Code

Description

4

1

The tree structure specified by ICLSON and ICRSON is invalid because an attempt to assign an observation to more than one cluster is being made.

4

2

The tree structure specified by ICLSON and ICRSON is incorrect because an observation is not assigned to a cluster.

Examples

Example 1

In the following example, cluster membership for K = 2 clusters is found for the displayed cluster tree. The output vector ICLUS contains the cluster numbers for each observation.

 

 

USE CNUMB_INT

USE WRIRN_INT

 

IMPLICIT NONE

INTEGER K, NODE

PARAMETER (K=2, NODE=5)

!

INTEGER ICLSON(NODE-1), ICLUS(NODE), ICRSON(NODE-1), NCLUS(K)

!

DATA ICLSON/5, 6, 4, 7/

DATA ICRSON/3, 1, 2, 8/

! Compute cluster membership

CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

! Print output

CALL WRIRN ('ICLUS', ICLUS, 1, NODE, 1)

CALL WRIRN ('NCLUS', NCLUS, 1, K, 1)

!

END

Output

 

ICLUS

1 2 3 4 5

1 2 1 2 1

 

NCLUS

1 2

3 2

Example 2

This example illustrates the typical usage of CNUMB. The Fisher iris data (see routine GDATA, Chapter 19, “Utilities”, is clustered. First the distance between the irises are computed using routine CDIST. The resulting distance matrix is then clustered using routine CLINK. The cluster membership for 5 clusters is then obtained via routine CNUMB using the output from CLINK. The need for 5 clusters can be obtained either by theoretical means or by examining a cluster tree. Because the cluster tree is too large to be included in this example, the call to routine TREEP (see Chapter 16, “Line Printer Graphics”) that would ordinarily print the cluster tree has been commented in the example code. The cluster membership for each of the iris observations is printed.

 

USE IMSL_LIBRARIES

 

IMPLICIT NONE

INTEGER IDATA, IPRINT, IROW, K, &

LDDIST, LDX, NCOL, NODE, NODEX, NROW, NVAR

PARAMETER (IDATA=3, IPRINT=0, IROW=1, K=5, LDDIST=150, &

LDX=150, NCOL=5, NODE=150, NODEX=5, NROW=150, NVAR=4)

!

INTEGER I, ICLSON(NROW-1), ICLUS(NODE), ICRSON(NROW-1), &

IND(4), J, NCLUS(K), NSCALE, NXCOL, NXROW

REAL AMAX1, CLEVEL(NROW-1), DIST(LDDIST,LDDIST), RN, &

SCALE(2), X(LDX,NCOL)

CHARACTER NODENM(NODE)*7

INTRINSIC AMAX1

!

DATA IND/2, 3, 4, 5/

DATA NSCALE/1/

DATA SCALE/0.0, 3.5/

! Get IRIS data.

CALL GDATA (IDATA, X, NXROW, NXCOL)

! Compute the dissimilarities.

CALL CDIST (X, DIST, IND=IND)

! Make sure each distance is unique,

! then copy the upper triangle matrix

! to the lower triangle matrix.

CALL RNSET (4)

DO 20 I=1, NODE

DO 10 J=I + 1, NODE

RN = RNUNF()

DIST(I,J) = AMAX1(0.0,DIST(I,J)+(0.001*RN))

10 CONTINUE

DIST(I,I) = 0.0

CALL SCOPY (I-1, DIST(1:,I), 1, DIST(I:,1), LDDIST)

20 CONTINUE

! The initial clustering

CALL CLINK (DIST, CLEVEL, ICLSON, ICRSON)

! Print the tree.

NODENM(1) = 'DEFAULT'

! CALL TREEP (ICLSON, ICRSON, CLEVEL, NSCALE, SCALE, NODENM)

! Compute membership for 5 clusters

CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

! Print output

CALL WRIRN ('ICLUS', ICLUS, 1, NODE, 1)

CALL WRIRN ('NCLUS', NCLUS, 1, K, 1)

!

END

Output

 

ICLUS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

 

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

 

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 1 2 2

 

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

 

81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1

 

100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2

 

116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131

2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2

 

132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147

4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

 

148 149 150

2 2 2

 

NCLUS

1 2 3 4 5

4 93 1 2 50