public class Dissimilarities extends Object implements Serializable, Cloneable
Class Dissimilarities
computes an upper triangular matrix
(excluding the diagonal) of dissimilarities (or similarities) between the
columns (or rows) of a matrix. Nine different distance measures can be
computed. For the first three measures, three different scaling options can
be employed. The distance matrix computed is generally used as input to
clustering or multidimensional scaling functions.
The following discussion assumes that the distance measure is being
computed between the columns of the matrix. If distances between the rows of
the matrix are desired, use row
= true
in the
setRow
method.
The distance method and scaling option used by Dissimilarities can be
set via methods setDistanceMethod
and setScalingOption
,
respectively. For distance methods L2_NORM
, L1_NORM
, or
INFINITY_NORM
, each row of x
is first scaled
according to the value specified by the setScalingOption
method.
The scaling parameters are obtained from the values in the row scaled as either
the standard deviation of the row or the row range; the standard deviation
is computed from the unbiased estimate of the variance. If no scaling is
performed, the parameters in the following discussion are all 1.0 (see
setScalingOption
). Once the scaling value (if any) has
been computed, the distance between column i and column j is
computed via the difference vector \(z_k=\frac{(x_k-y_k)}{s_k},i=1,
\ldots,ndstm\), where \(x_k\) denotes the
k-th element in the i-th column, \(y_k\)
denotes the corresponding element in the j-th column, and ndstm
is the number of rows if differencing columns and the number of
columns if differencing rows. For given \(z_i\),
the distance methods that allow scaling are defined as:
distanceMethod |
Metric |
---|---|
L2_NORM | Euclidean distance (\(L_2 \) norm) |
L1_NORM | Sum of the absolute differences (\(L_1\) norm) |
INFINITY_NORM | Maximum difference (\(L_\infty\) norm) |
distanceMethod |
Metric |
---|---|
MAHALANOBIS | Mahalanobis distance |
ABS_COSINE | Absolute value of the cosine of the angle between the vectors |
ANGLE_IN_RADIANS | Angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors |
CORRELATION_COEFFICIENT | Correlation coefficient |
ABS_CORRELATION_COEFFICIENT | Absolute value of the correlation coefficient |
EXACT_MATCHES | Number of exact matches, where \(x_i = y_i\). |
For the Mahalanobis distance, any variable used in computing the distance
measure that is (numerically) linearly dependent upon the previous variables
in the indexArray
vector from the setIndex
method is
omitted from the distance measure.
Modifier and Type | Class and Description |
---|---|
static class |
Dissimilarities.NoPositiveVarianceException
No variable has positive variance.
|
static class |
Dissimilarities.ScaleFactorZeroException
The computations cannot continue because a scale factor is zero.
|
static class |
Dissimilarities.ZeroNormException
The computations cannot continue because the Euclidean norm of the
column is equal to zero.
|
Modifier and Type | Field and Description |
---|---|
static int |
ABS_CORRELATION_COEFFICIENT
Indicates the absolute value of the correlation coefficient distance method.
|
static int |
ABS_COSINE
Indicates the absolute value of the cosine of the angle between the
vectors distance method.
|
static int |
ANGLE_IN_RADIANS
Indicates the angle in radians (0, \(\pi\)) between the lines through the
origin defined by the vectors distance method.
|
static int |
CORRELATION_COEFFICIENT
Indicates the correlation coefficient distance method.
|
static int |
EXACT_MATCHES
Indicates the number of exact matches distance method.
|
static int |
INFINITY_NORM
Indicates the maximum difference (\(L_\infty\) norm)
distance method.
|
static int |
L1_NORM
Indicates the sum of the absolute differences (\(L_1\)
norm) distance method.
|
static int |
L2_NORM
Indicates the Euclidean distance method (\(L_2\) norm).
|
static int |
MAHALANOBIS
Indicates the Mahalanobis distance method.
|
static int |
NO_SCALING
Indicates no scaling.
|
static int |
RANGE
Indicates scaling by the range.
|
static int |
STD_DEV
Indicates scaling by the standard deviation.
|
Constructor and Description |
---|
Dissimilarities(double[][] x)
Constructor for
Dissimilarities . |
Dissimilarities(double[][] x,
int distanceMethod,
int distanceScale,
int iRow)
Deprecated.
Use
Dissimilarities.Dissimilarities(double[][]) instead. |
Dissimilarities(double[][] x,
int distanceMethod,
int distanceScale,
int iRow,
int[] indexArray)
Deprecated.
Use
Dissimilarities.Dissimilarities(double[][]) instead. |
Modifier and Type | Method and Description |
---|---|
void |
compute()
Computes a matrix of dissimilarities (or similarities) between the columns
(or rows) of a matrix.
|
double[][] |
getDistanceMatrix()
Returns the distance matrix.
|
int |
getDistanceMethod()
Returns the method used in computing the dissimilarities or similarities.
|
int[] |
getIndex()
Returns the indices of the rows (columns) used in computing the distance measure.
|
boolean |
getRow()
Returns a
boolean indicating whether distances are computed
between rows or columns of x . |
int |
getScalingOption()
Returns the scaling option.
|
void |
setDistanceMethod(int distanceMethod)
Sets the method to be used in computing the dissimilarities or similarities.
|
void |
setIndex(int[] indexArray)
Sets the indices of the rows (columns).
|
void |
setRow(boolean row)
Identifies whether distances are computed between rows or columns of
x . |
void |
setScalingOption(int distanceScale)
Sets the scaling option used if the
L2_NORM , L1_NORM ,
or INFINITY_NORM distance methods are specified. |
public static final int L2_NORM
public static final int L1_NORM
public static final int INFINITY_NORM
public static final int MAHALANOBIS
public static final int ABS_COSINE
public static final int ANGLE_IN_RADIANS
public static final int CORRELATION_COEFFICIENT
public static final int ABS_CORRELATION_COEFFICIENT
public static final int EXACT_MATCHES
public static final int NO_SCALING
public static final int STD_DEV
public static final int RANGE
public Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow) throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException
Dissimilarities.Dissimilarities(double[][])
instead.Dissimilarities
.x
- A double
matrix containing the data input
matrix.distanceMethod
- An int
identifying the method to be
used in computing the dissimilarities or
similarities. Acceptable values of
distanceMethod
are 0, 1, 2, ..., 8.
See above for a description of these methods.distanceScale
- An int
containing the scaling
option.
distanceScale |
Method |
---|---|
0 | No scaling is performed. |
1 | Scale each
column (row if iRow=1 ) by
the standard deviation of the column
(row). |
2 | Scale each
column (row if iRow=1 ) by
the range of the column (row). |
iRow
- An int
identifying whether distances are
computed between rows or columns of x
. If
iRow
= 1, distances are computed between
the rows of x
. Otherwise, distances between
the columns of x
are computed.Dissimilarities.ScaleFactorZeroException
- thrown when computations cannot
continue because a scale factor is zeroDissimilarities.NoPositiveVarianceException
- thrown when no variable has
positive varianceDissimilarities.ZeroNormException
- is thrown when the Euclidean norm of
a column is equal to zeropublic Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow, int[] indexArray) throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException
Dissimilarities.Dissimilarities(double[][])
instead.Dissimilarities
.x
- A double
matrix containing the data input
matrix.distanceMethod
- An int
identifying the method to be
used in computing the dissimilarities or
similarities. Acceptable values of
distanceMethod
are 0, 1, 2, ..., 8.
See above for a description of these methods.distanceScale
- An int
containing the scaling
option.
distanceScale |
Method |
---|---|
0 | No scaling is performed |
1 | Scale each
column (row if iRow=1 ) by
the standard deviation of the column
(row). |
2 | Scale each
column (row if iRow=1 ) by
the range of the column (row) |
iRow
- An int
identifying whether distances are
computed between rows or columns of x
. If
iRow=1
, distances are computed between the
rows of x
. Otherwise, distances between the
columns of x
are computed.indexArray
- An int
array containing the indices of
the rows (columns if iRow
is 1) to be
used in computing the distance measure.Dissimilarities.ScaleFactorZeroException
- thrown when computations cannot
continue because a scale factor is zeroDissimilarities.NoPositiveVarianceException
- thrown when no variable has
positive variance.Dissimilarities.ZeroNormException
- is thrown when the Euclidean norm of
a column is equal to zeropublic Dissimilarities(double[][] x)
Dissimilarities
.x
- A double
matrix containing the data input
matrix.public void setDistanceMethod(int distanceMethod)
distanceMethod
- An int
identifying the method to be
used in computing the dissimilarities or
similarities. Acceptable values of
distanceMethod
are:
distanceMethod |
Metric |
---|---|
L2_NORM | Euclidean distance (\(L_2 \) norm) |
L1_NORM | Sum of the absolute differences (\(L_1\) norm) |
INFINITY_NORM | Maximum difference (\(L_\infty\) norm) |
MAHALANOBIS | Mahalanobis distance |
ABS_COSINE | Absolute value of the cosine of the angle between the vectors |
ANGLE_IN_RADIANS | Angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors |
CORRELATION_COEFFICIENT | Correlation coefficient |
ABS_CORRELATION_COEFFICIENT | Absolute value of the correlation coefficient |
EXACT_MATCHES | Number of exact matches, where \(x_i = y_i\). |
distanceMethod
= L2_NORM
.public int getDistanceMethod()
public void setScalingOption(int distanceScale)
L2_NORM
, L1_NORM
,
or INFINITY_NORM
distance methods are specified.
See setDistanceMethod
.distanceScale
- An int
containing the scaling
option. By default, distanceScale
= NO_SCALING
.
distanceScale |
Method |
---|---|
NO_SCALING | No scaling is performed. |
STD_DEV |
If setRow(false) , scale each
column by the standard deviation of the column.If setRow(true) , scale each
row by the standard deviation of the row. |
RANGE |
If setRow(false) , scale each
column by the range of the column.If setRow(true) , scale each
row by the range of the row. |
public int getScalingOption()
public void setRow(boolean row)
x
.row
- A boolean
identifying whether distances are
computed between rows or columns of x
. If
row
= true
, distances are computed between
the rows of x
. Otherwise, distances between
the columns of x
are computed. By default,
row
= true
.public boolean getRow()
boolean
indicating whether distances are computed
between rows or columns of x
.public void setIndex(int[] indexArray)
indexArray
- An int
array containing the indices of
the columns (rows if row
= false
) to be
used in computing the distance measure. By default,
if row
= true
,
indexArray
= 0, 1, ..., x[0].length-1
.
If row
= false
,
indexArray
= 0, 1, ..., x.length-1
, see
setRow
.public int[] getIndex()
public void compute() throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException
Dissimilarities.ScaleFactorZeroException
- is thrown when computations cannot
continue because a scale factor is zeroDissimilarities.NoPositiveVarianceException
- is thrown when no variable has
positive varianceDissimilarities.ZeroNormException
- is thrown when the Euclidean norm of
a column is equal to zeropublic final double[][] getDistanceMatrix()
double
matrix containing the distance matrix.Copyright © 2020 Rogue Wave Software. All rights reserved.