CNUMB
Computes cluster membership for a hierarchical cluster tree.
Required Arguments
NODE — Number of data points clustered. (Input)
ICLSON — Vector of length NODE ‑ 1 containing the left son cluster numbers. (Input)
Cluster NODE + I is formed by merging clusters ICLSON(I) and ICRSON(I).
ICRSON — Vector of length NODE ‑ 1 containing the right son cluster numbers. (Input)
Cluster NODE + I is formed by merging clusters ICLSON(I) and ICRSON(I).
K — Desired number of clusters. (Input)
ICLUS — Vector of length NODE containing the cluster membership of each observation. (Output)
Observation I is in cluster ICLUS(I) when K clusters are specified.
NCLUS — Vector of length K containing the number of observations in each cluster. (Output)
FORTRAN 90 Interface
Generic: CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)
Specific: The specific interface name is CNUMB.
FORTRAN 77 Interface
Single: CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)
Description
Given a fixed number of clusters (K) and the cluster tree (vectors ICRSON and ICLSON) produced by the hierarchical clustering algorithm (see routine CLINK), routine CNUMB determines the cluster membership of each observation. The routine CNUMB first determines the root nodes for the K distinct subtrees forming the K clusters and then traverses each subtree to determine the cluster membership of each observation. The routine CNUMB also returns the number of observations found in each cluster.
Comments
1. Workspace may be explicitly provided, if desired, by use of C2UMB. The reference is:
CALL C2UMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS, IPT)
The additional argument is:
IPT — Work vector of length 2 * NODE.
2. Informational errors
Type |
Code |
Description |
4 |
1 |
The tree structure specified by ICLSON and ICRSON is invalid because an attempt to assign an observation to more than one cluster is being made. |
4 |
2 |
The tree structure specified by ICLSON and ICRSON is incorrect because an observation is not assigned to a cluster. |
Examples
Example 1
In the following example, cluster membership for K = 2 clusters is found for the displayed cluster tree. The output vector ICLUS contains the cluster numbers for each observation.
USE CNUMB_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER K, NODE
PARAMETER (K=2, NODE=5)
!
INTEGER ICLSON(NODE-1), ICLUS(NODE), ICRSON(NODE-1), NCLUS(K)
!
DATA ICLSON/5, 6, 4, 7/
DATA ICRSON/3, 1, 2, 8/
! Compute cluster membership
CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)
! Print output
CALL WRIRN ('ICLUS', ICLUS, 1, NODE, 1)
CALL WRIRN ('NCLUS', NCLUS, 1, K, 1)
!
END
Output
ICLUS
1 2 3 4 5
1 2 1 2 1
NCLUS
1 2
3 2
Example 2
This example illustrates the typical usage of CNUMB. The Fisher iris data (see routine GDATA, Chapter 19, “Utilities”, is clustered. First the distance between the irises are computed using routine CDIST. The resulting distance matrix is then clustered using routine CLINK. The cluster membership for 5 clusters is then obtained via routine CNUMB using the output from CLINK. The need for 5 clusters can be obtained either by theoretical means or by examining a cluster tree. Because the cluster tree is too large to be included in this example, the call to routine TREEP (see Chapter 16, “Line Printer Graphics”) that would ordinarily print the cluster tree has been commented in the example code. The cluster membership for each of the iris observations is printed.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER IDATA, IPRINT, IROW, K, &
LDDIST, LDX, NCOL, NODE, NODEX, NROW, NVAR
PARAMETER (IDATA=3, IPRINT=0, IROW=1, K=5, LDDIST=150, &
LDX=150, NCOL=5, NODE=150, NODEX=5, NROW=150, NVAR=4)
!
INTEGER I, ICLSON(NROW-1), ICLUS(NODE), ICRSON(NROW-1), &
IND(4), J, NCLUS(K), NSCALE, NXCOL, NXROW
REAL AMAX1, CLEVEL(NROW-1), DIST(LDDIST,LDDIST), RN, &
SCALE(2), X(LDX,NCOL)
CHARACTER NODENM(NODE)*7
INTRINSIC AMAX1
!
DATA IND/2, 3, 4, 5/
DATA NSCALE/1/
DATA SCALE/0.0, 3.5/
! Get IRIS data.
CALL GDATA (IDATA, X, NXROW, NXCOL)
! Compute the dissimilarities.
CALL CDIST (X, DIST, IND=IND)
! Make sure each distance is unique,
! then copy the upper triangle matrix
! to the lower triangle matrix.
CALL RNSET (4)
DO 20 I=1, NODE
DO 10 J=I + 1, NODE
RN = RNUNF()
DIST(I,J) = AMAX1(0.0,DIST(I,J)+(0.001*RN))
10 CONTINUE
DIST(I,I) = 0.0
CALL SCOPY (I-1, DIST(1:,I), 1, DIST(I:,1), LDDIST)
20 CONTINUE
! The initial clustering
CALL CLINK (DIST, CLEVEL, ICLSON, ICRSON)
! Print the tree.
NODENM(1) = 'DEFAULT'
! CALL TREEP (ICLSON, ICRSON, CLEVEL, NSCALE, SCALE, NODENM)
! Compute membership for 5 clusters
CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)
! Print output
CALL WRIRN ('ICLUS', ICLUS, 1, NODE, 1)
CALL WRIRN ('NCLUS', NCLUS, 1, K, 1)
!
END
Output
ICLUS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 1 2 2
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
2 2 4 2 2 2 2 2 2 2 2 2 2 2 2 2
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
148 149 150
2 2 2
NCLUS
1 2 3 4 5
4 93 1 2 50