Performs a hierarchical cluster analysis given a distance matrix.
void imsls_f_cluster_hierarchical (int npt, float *dist, …, 0)
The type double function is imsls_d_cluster_hierarchical.
int npt
(Input)
Number of data points to be clustered.
void *dist
(Input/Ouput)
An npt by npt symmetric matrix
containing the distance (or similarity) matrix.
dist is a symmetric
matrix. On input, only the upper triangular part needs to be present. The
function imsls_f_cluster_hierarchical
saves the upper triangular part of dist in the lower
triangle. On return from imsls_f_cluster_hierarchical,
the upper triangular part of dist is restored, and
the matrix is made symmetric.
float *imsls_f_cluster_hierarchical (int npt, float *dist,
IMSLS_METHOD, int imeth,
IMSLS_TRANSFORMATION, int
itrans,
IMSLS_CLUSTERS, float
**clevel,
int **iclson,
int **icrson,
IMSLS_CLUSTERS_USER, float
clevel[], int
iclson[], int
icrson[],
0)
IMSLS_METHOD,
int imeth
(Input)
Option giving the clustering method to be used.
Default: imeth = 0.
IMSLS_TRANSFORMATION,
int itrans (Input)
Option giving the method to be used
for clustering.
Default: itrans = 0.
IMSLS_CLUSTERS,
float **clevel, int **iclson, int **icrson
(Output)
Argument clevel is the address
of an array of length npt - 1 containing the
level at which the clusters are joined. clevel[k-1]
contains the distance (or similarity) level at which cluster npt + k was
formed. If the original data in dist was transformed
via the optional argument IMSLS_TRANSFORMATION,
the inverse transformation is applied to the values in clevel prior to exit
from imsls_f_cluster_hierarchical.
Argument iclson is the address
of an array of length npt - 1 containing the
left sons of each merged cluster. Argument icrson is the address
of an array of length npt - 1 containing the
right sons of each merged cluster. Cluster
npt + k is
formed by merging clusters iclson[k-1] and
icrson[k-1].
IMSLS_CLUSTERS_USER,
float clevel[], int iclson[], int icrson[]
(Output)
Storage for arrays clevel, iclson, and icrson is provided by
the user. See IMSLS_CLUSTERS.
Function imsls_f_cluster_hierarchical conducts a hierarchical cluster analysis based upon the distance matrix, or by appropriate use of the IMSLS_TRANSFORMATION optional argument, based upon a similarity matrix. Only the upper triangular part of the matrix dist is required as input to imsls_f_cluster_hierarchical.
Hierarchical clustering in imsls_f_cluster_hierarchical
proceeds as follows. Initially, each data point is considered to be a cluster,
numbered 1 to
n = npt.
1. If the data matrix contains similarities, they are converted to distances by the method specified by IMSLS_TRANSFORMATION. Set k = 1.
2. A search is made of the distance matrix to find the two closest clusters. These clusters are merged to form a new cluster, numbered n + k. The cluster numbers of the two clusters joined at this stage are saved in icrson and iclson, and the distance measure between the two clusters is stored in clevel.
3. Based upon the method of clustering, updating of the distance measure in the row and column of dist corresponding to the new cluster is performed.
4. Set k = k + 1. If k < n, go to Step 2.
The five methods differ primarily in how the distance matrix is updated after two clusters have been joined. The IMSLS_METHOD optional argument specifies how the distance of the cluster just merged with each of the remaining clusters will be updated. Function imsls_f_cluster_hierarchical allows five methods for computing the distances. To understand these measures, suppose in the following discussion that clusters “A” and “B” have just been joined to form cluster “Z”, and interest is in computing the distance of Z with another cluster called “C”.
In general, single linkage will yield long thin clusters while complete linkage will yield clusters that are more spherical. Average linkage and Ward’s linkage tend to yield clusters that are similar to those obtained with complete linkage.
Function imsls_f_cluster_hierarchical produces a unique representation of the binary cluster tree via the following three conventions; the fact that the tree is unique should aid in interpreting the clusters. First, when two clusters are joined and each cluster contains two or more data points, the cluster that was initially formed with the smallest level (in clevel) becomes the left son. Second, when a cluster containing more than one data point is joined with a cluster containing a single data point, the cluster with the single data point becomes the right son. Finally, when two clusters containing only one object are joined, the cluster with the smallest cluster number becomes the right son.
1. The clusters corresponding to the original data points are numbered from 1 to npt. The npt - 1 clusters formed by merging clusters are numbered npt + 1 to npt + (npt - 1).
2. Raw correlations, if used as similarities, should be made positive and transformed to a distance measure. One such transformation can be performed by specifying optional argument IMSLS_TRANSFORMATION, with itrans = 2 in imsls_f_cluster_hierarchical.
3. The user may cluster either variables or observations in imsls_f_cluster_hierarchical since a dissimilarity matrix, not the original data, is used. Function imsls_f_dissimilarities (page 585) may be used to compute the matrix dist for either the variables or observations.
In the following example, the average distance within clusters method is used to perform a hierarchical cluster analysis of the Fisher iris data. Function imsls_f_data_sets (see Chapter 15, “Utilities”;) is first used to obtain the Fisher iris data. The example is typical in that after the program obtains the data, function imsls_f_dissimilarities computes the distance matrix (dist) prior to calling imsls_f_cluster_hierarchical.
int iscale=1, ncol=5, nrow=150, nvar=4, npt = 150;
int i, iclson[149], icrson[149], ind[4] = {1, 2, 3, 4};
dist = imsls_f_dissimilarities(nrow, ncol, x,
imsls_f_cluster_hierarchical(npt, dist,
IMSLS_CLUSTERS_USER, clevel, iclson, icrson,
for (i=0;i<149;i+=15) printf("%6.2f\t", clevel[i]);
for (i=0;i<149;i+=15) printf("%6d\t", iclson[i]);
for (i=0;i<149;i+=15) printf("%6d\t", icrson[i]);
0.00 0.17 0.23 0.27 0.31 0.37 0.41 0.48 0.60 0.78
143 153 17 140 53 198 186 218 261 249
102 29 6 113 51 91 212 243 266 262
Visual Numerics, Inc. PHONE: 713.784.3131 FAX:713.781.9260 |