Package com.imsl.stat

Class Dissimilarities

java.lang.Object
com.imsl.stat.Dissimilarities
All Implemented Interfaces:
Serializable, Cloneable

public class Dissimilarities extends Object implements Serializable, Cloneable
Computes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.

Class Dissimilarities computes an upper triangular matrix (excluding the diagonal) of dissimilarities (or similarities) between the columns (or rows) of a matrix. Nine different distance measures can be computed. For the first three measures, three different scaling options can be employed. The distance matrix computed is generally used as input to clustering or multidimensional scaling functions.

The following discussion assumes that the distance measure is being computed between the columns of the matrix. If distances between the rows of the matrix are desired, use row = true in the setRow method.

The distance method and scaling option used by Dissimilarities can be set via methods setDistanceMethod and setScalingOption, respectively. For distance methods L2_NORM, L1_NORM, or INFINITY_NORM, each row of x is first scaled according to the value specified by the setScalingOption method. The scaling parameters are obtained from the values in the row scaled as either the standard deviation of the row or the row range; the standard deviation is computed from the unbiased estimate of the variance. If no scaling is performed, the parameters in the following discussion are all 1.0 (see setScalingOption). Once the scaling value (if any) has been computed, the distance between column i and column j is computed via the difference vector \(z_k=\frac{(x_k-y_k)}{s_k},i=1, \ldots,ndstm\), where \(x_k\) denotes the k-th element in the i-th column, \(y_k\) denotes the corresponding element in the j-th column, and ndstm is the number of rows if differencing columns and the number of columns if differencing rows. For given \(z_i\), the distance methods that allow scaling are defined as:

distanceMethod Metric
L2_NORMEuclidean distance (\(L_2 \) norm)
L1_NORMSum of the absolute differences (\(L_1\) norm)
INFINITY_NORMMaximum difference (\(L_\infty\) norm)

The following distance measures do not allow for scaling.

distanceMethod Metric
MAHALANOBISMahalanobis distance
ABS_COSINEAbsolute value of the cosine of the angle between the vectors
ANGLE_IN_RADIANSAngle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors
CORRELATION_COEFFICIENTCorrelation coefficient
ABS_CORRELATION_COEFFICIENTAbsolute value of the correlation coefficient
EXACT_MATCHESNumber of exact matches, where \(x_i = y_i\).

For the Mahalanobis distance, any variable used in computing the distance measure that is (numerically) linearly dependent upon the previous variables in the indexArray vector from the setIndex method is omitted from the distance measure.

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    No variable has positive variance.
    static class 
    The computations cannot continue because a scale factor is zero.
    static class 
    The computations cannot continue because the Euclidean norm of the column is equal to zero.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    Indicates the absolute value of the correlation coefficient distance method.
    static final int
    Indicates the absolute value of the cosine of the angle between the vectors distance method.
    static final int
    Indicates the angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors distance method.
    static final int
    Indicates the correlation coefficient distance method.
    static final int
    Indicates the number of exact matches distance method.
    static final int
    Indicates the maximum difference (\(L_\infty\) norm) distance method.
    static final int
    Indicates the sum of the absolute differences (\(L_1\) norm) distance method.
    static final int
    Indicates the Euclidean distance method (\(L_2\) norm).
    static final int
    Indicates the Mahalanobis distance method.
    static final int
    Indicates no scaling.
    static final int
    Indicates scaling by the range.
    static final int
    Indicates scaling by the standard deviation.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Dissimilarities(double[][] x)
    Constructor for Dissimilarities.
    Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow)
    Deprecated.
    Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow, int[] indexArray)
    Deprecated.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Computes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.
    final double[][]
    Returns the distance matrix.
    int
    Returns the method used in computing the dissimilarities or similarities.
    int[]
    Returns the indices of the rows (columns) used in computing the distance measure.
    boolean
    Returns a boolean indicating whether distances are computed between rows or columns of x.
    int
    Returns the scaling option.
    void
    setDistanceMethod(int distanceMethod)
    Sets the method to be used in computing the dissimilarities or similarities.
    void
    setIndex(int[] indexArray)
    Sets the indices of the rows (columns).
    void
    setRow(boolean row)
    Identifies whether distances are computed between rows or columns of x.
    void
    setScalingOption(int distanceScale)
    Sets the scaling option used if the L2_NORM, L1_NORM, or INFINITY_NORM distance methods are specified.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • L2_NORM

      public static final int L2_NORM
      Indicates the Euclidean distance method (\(L_2\) norm).
      See Also:
    • L1_NORM

      public static final int L1_NORM
      Indicates the sum of the absolute differences (\(L_1\) norm) distance method.
      See Also:
    • INFINITY_NORM

      public static final int INFINITY_NORM
      Indicates the maximum difference (\(L_\infty\) norm) distance method.
      See Also:
    • MAHALANOBIS

      public static final int MAHALANOBIS
      Indicates the Mahalanobis distance method.
      See Also:
    • ABS_COSINE

      public static final int ABS_COSINE
      Indicates the absolute value of the cosine of the angle between the vectors distance method.
      See Also:
    • ANGLE_IN_RADIANS

      public static final int ANGLE_IN_RADIANS
      Indicates the angle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors distance method.
      See Also:
    • CORRELATION_COEFFICIENT

      public static final int CORRELATION_COEFFICIENT
      Indicates the correlation coefficient distance method.
      See Also:
    • ABS_CORRELATION_COEFFICIENT

      public static final int ABS_CORRELATION_COEFFICIENT
      Indicates the absolute value of the correlation coefficient distance method.
      See Also:
    • EXACT_MATCHES

      public static final int EXACT_MATCHES
      Indicates the number of exact matches distance method.
      See Also:
    • NO_SCALING

      public static final int NO_SCALING
      Indicates no scaling.
      See Also:
    • STD_DEV

      public static final int STD_DEV
      Indicates scaling by the standard deviation.
      See Also:
    • RANGE

      public static final int RANGE
      Indicates scaling by the range.
      See Also:
  • Constructor Details

    • Dissimilarities

      public Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow) throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException
      Deprecated.
      Constructor for Dissimilarities.
      Parameters:
      x - A double matrix containing the data input matrix.
      distanceMethod - An int identifying the method to be used in computing the dissimilarities or similarities. Acceptable values of distanceMethod are 0, 1, 2, ..., 8. See above for a description of these methods.
      distanceScale - An int containing the scaling option.

      distanceScale Method
      0No scaling is performed.
      1Scale each column (row if iRow=1) by the standard deviation of the column (row).
      2Scale each column (row if iRow=1) by the range of the column (row).

      iRow - An int identifying whether distances are computed between rows or columns of x. If iRow = 1, distances are computed between the rows of x. Otherwise, distances between the columns of x are computed.
      Throws:
      Dissimilarities.ScaleFactorZeroException - thrown when computations cannot continue because a scale factor is zero
      Dissimilarities.NoPositiveVarianceException - thrown when no variable has positive variance
      Dissimilarities.ZeroNormException - is thrown when the Euclidean norm of a column is equal to zero
    • Dissimilarities

      public Dissimilarities(double[][] x, int distanceMethod, int distanceScale, int iRow, int[] indexArray) throws Dissimilarities.ScaleFactorZeroException, Dissimilarities.ZeroNormException, Dissimilarities.NoPositiveVarianceException
      Deprecated.
      Constructor for Dissimilarities.
      Parameters:
      x - A double matrix containing the data input matrix.
      distanceMethod - An int identifying the method to be used in computing the dissimilarities or similarities. Acceptable values of distanceMethod are 0, 1, 2, ..., 8. See above for a description of these methods.
      distanceScale - An int containing the scaling option.

      distanceScale Method
      0No scaling is performed
      1Scale each column (row if iRow=1) by the standard deviation of the column (row).
      2Scale each column (row if iRow=1) by the range of the column (row)

      iRow - An int identifying whether distances are computed between rows or columns of x. If iRow=1, distances are computed between the rows of x. Otherwise, distances between the columns of x are computed.
      indexArray - An int array containing the indices of the rows (columns if iRow is 1) to be used in computing the distance measure.
      Throws:
      Dissimilarities.ScaleFactorZeroException - thrown when computations cannot continue because a scale factor is zero
      Dissimilarities.NoPositiveVarianceException - thrown when no variable has positive variance.
      Dissimilarities.ZeroNormException - is thrown when the Euclidean norm of a column is equal to zero
    • Dissimilarities

      public Dissimilarities(double[][] x)
      Constructor for Dissimilarities.
      Parameters:
      x - A double matrix containing the data input matrix.
  • Method Details

    • setDistanceMethod

      public void setDistanceMethod(int distanceMethod)
      Sets the method to be used in computing the dissimilarities or similarities.
      Parameters:
      distanceMethod - An int identifying the method to be used in computing the dissimilarities or similarities. Acceptable values of distanceMethod are:

      distanceMethod Metric
      L2_NORMEuclidean distance (\(L_2 \) norm)
      L1_NORMSum of the absolute differences (\(L_1\) norm)
      INFINITY_NORMMaximum difference (\(L_\infty\) norm)
      MAHALANOBISMahalanobis distance
      ABS_COSINEAbsolute value of the cosine of the angle between the vectors
      ANGLE_IN_RADIANSAngle in radians (0, \(\pi\)) between the lines through the origin defined by the vectors
      CORRELATION_COEFFICIENTCorrelation coefficient
      ABS_CORRELATION_COEFFICIENTAbsolute value of the correlation coefficient
      EXACT_MATCHESNumber of exact matches, where \(x_i = y_i\).

      See class description for more details. By default, distanceMethod = L2_NORM.
    • getDistanceMethod

      public int getDistanceMethod()
      Returns the method used in computing the dissimilarities or similarities.
    • setScalingOption

      public void setScalingOption(int distanceScale)
      Sets the scaling option used if the L2_NORM, L1_NORM, or INFINITY_NORM distance methods are specified. See setDistanceMethod.
      Parameters:
      distanceScale - An int containing the scaling option. By default, distanceScale = NO_SCALING.

      distanceScale Method
      NO_SCALINGNo scaling is performed.
      STD_DEV If setRow(false), scale each column by the standard deviation of the column.
      If setRow(true), scale each row by the standard deviation of the row.
      RANGE If setRow(false), scale each column by the range of the column.
      If setRow(true), scale each row by the range of the row.

    • getScalingOption

      public int getScalingOption()
      Returns the scaling option.
    • setRow

      public void setRow(boolean row)
      Identifies whether distances are computed between rows or columns of x.
      Parameters:
      row - A boolean identifying whether distances are computed between rows or columns of x. If row = true, distances are computed between the rows of x. Otherwise, distances between the columns of x are computed. By default, row = true.
    • getRow

      public boolean getRow()
      Returns a boolean indicating whether distances are computed between rows or columns of x.
    • setIndex

      public void setIndex(int[] indexArray)
      Sets the indices of the rows (columns).
      Parameters:
      indexArray - An int array containing the indices of the columns (rows if row = false) to be used in computing the distance measure. By default, if row = true, indexArray = 0, 1, ..., x[0].length-1. If row = false, indexArray = 0, 1, ..., x.length-1, see setRow.
    • getIndex

      public int[] getIndex()
      Returns the indices of the rows (columns) used in computing the distance measure.
    • compute

      Computes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.
      Throws:
      Dissimilarities.ScaleFactorZeroException - is thrown when computations cannot continue because a scale factor is zero
      Dissimilarities.NoPositiveVarianceException - is thrown when no variable has positive variance
      Dissimilarities.ZeroNormException - is thrown when the Euclidean norm of a column is equal to zero
    • getDistanceMatrix

      public final double[][] getDistanceMatrix()
      Returns the distance matrix.
      Returns:
      A double matrix containing the distance matrix.