com.imsl.stat.MultidimensionalScaling

All Implemented Interfaces:: Serializable

public class MultidimensionalScaling extends Object implements Serializable

Performs metric multidimensional scaling using the Euclidean or individual differences model.

Overview
Class MultidimensionalScaling performs multidimensional scaling analysis. Input to MultidimensionalScaling consists of symmetric similarity or dissimilarity matrices measuring distances between pairs of objects. In multidimensional scaling, optimized (dis)similarities - generally also called proximities - are used to configure or position the objects within an ndim - dimensional space, where ndim is specified by the user. Optionally, in the individual differences scaling model, the weight assigned to each dimension for each subject may be changed.

The Input Data
The input similarity or dissimilarity data are stored in a three-dimensional array as a sequence of symmetric matrices. Each matrix uses the same group of objects but refers to a specific subject (or individual).
Missing values can be indicated by Double.NaN or a negative matrix entry. In either case, missing values are estimated as the mean dissimilarity for the subject and used as such when computing initial estimates, and they are omitted from the criterion function when optimal estimates are computed.
Class MultidimensionalScaling assumes a metric scaling model. When no transformation is specified (that is, transformationFormula = 1), then each datum (after transforming to dissimilarities) is a measure of distance plus a constant, $ \alpha_m $. In this case, the constant (which is always called the "intercept") is assumed to vary with subject and must first be added to the observed dissimilarities in order to obtain a metric. When a transformation is specified (that is, transformationFormula $ \ne $ 1), the meaning of $ \alpha_m $ changes (with respect to metrics). Thus, when transformationFormula = 1, the data is assumed to be interval while when transformationFormula $ \ne $ 1, ratio data is assumed. A scaling factor, the "slope", is also always estimated for each subject.

The Criterion (Stress) Function
When stressFormula = 1 or 2, the criterion or stress function in class MultidimensionalScaling is given as $$\phi = \sum_m\nu_m \sum_{i,j}\left(f(\delta^\ast_{ijm})- \alpha_m-\beta_mf(\delta_{ijm})\right)^2$$ where $\delta_{ijm}$ denotes the predicted distance between objects i and j on subject m, $\delta^\ast_{ijm}$ denotes the corresponding dissimilarity (the observed distance), $\nu_m$ is the stress weight assigned to the m-th subject, $f$ is one of the transformations $f(x)=x^2, f(x) = x,$ or $f(x)=\ln(x)$ specified by method setTransformationFormula, $\alpha_m$ is the intercept added to the transformed observations within each subject, and $\beta_m$ is the slope for the subject.
For stressFormula = 0, the criterion function is given as $$\phi = \sum_m n_m \ln\left(\sum_{i,j}\left(f(\delta^\ast_{ijm})- \alpha_m-\beta_mf(\delta_{ijm})\right)^2\right)$$ where $n_m$ is the number of non-missing observations on the m-th subject. Assuming fixed weights, the first derivatives of the criterion for stressFormula = 0 are identical to the first derivatives of the criterion when stressFormula = 1 or 2, but with weights $$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})- \alpha_m-\beta_mf(\delta_{ijm})\right)^2/n_m $$ Method setStressFormula can, thus, be thought of as changing the weighting to be used in the criterion function.
The transformation $f(x)$ specified by method setTransformationFormula is used to obtain constant within-subject variance of the subject dissimilarities. If the variance of the log of the observed dissimilarities (about the predicted dissimilarities) is constant within subject, then the log transformation should be used. In this case, the variance of a dissimilarity should be proportional to its magnitude. Alternatively, the within-subject variance may be constant when distances (or squared distances) are used.

The Distance Models and Subject Weights
The following distance models for $\delta_{ijm}$ are available in class MultidimensionalScaling:

Euclidean model $$ \delta^2_{ijm} = \sum_{k=1}^d(\lambda_{ik} - \lambda_{jk})^2 $$
Individual differences model $$ \delta^2_{ijm} = \sum_{k=1}^d w_{mk}(\lambda_{ik} - \lambda_{jk})^2 $$

Here, $\lambda_{ik}$ is the location of the i-th object in the k-th dimension, d is the number of dimensions and $w_{mk}$ is the subject weight assigned by the m-th subject to the k-th dimension. The matrix $\Lambda = (\lambda_{ij})$ containing the points which represent the objects in an ndim - dimensional space is called the point configuration.

The Stress Weights
Weights that are inversely proportional to the estimated variance of the dissimilarities (about their predicted values) within each subject may be preferred because such weights lead to normal distribution theory maximum likelihood estimates (when it is assumed that the dissimilarities are independently normally distributed with constant residual variance). When stressFormula = 0, the estimated (conditional) variance used as the inverse of the weight $\nu_m$ for the m-th subject is computed as $$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})- \alpha_m-\beta_mf(\delta_{ijm})\right)^2/n_m, $$ where the sum is over the observations for the subject and where $n_m$ is the number of observed non-missing dissimilarities for the subject. These weights are used in the first derivatives of the criterion function.
When stressFormula = 1, the within-subject average sum of squared dissimilarities are used for the weights. They are computed as $$\nu_m^{-1} = \sum_{i,j}f(\delta^\ast_{ijm})^2/n_m.$$ Finally, when stressFormula = 2, the within-subject variance of the dissimilarities is used for the weights. These are computed as $$\nu_m^{-1} = \sum_{i,j} \left(f(\delta^\ast_{ijm})- \overline{f(\delta^\ast_{\cdot \cdot m})}\right)^2/n_m,$$ where $\overline{f(\delta^\ast_{\cdot \cdot m})}$ denotes the average of the transformed dissimilarities for the m-th subject.

The Optimization Procedure
Initial estimates for the configuration matrix $\Lambda$ are obtained through methods of classical scaling, as discussed in Cox and Cox (2001), chapter 2. In the case of the individual differences model, initial estimates for the matrix $W$ of subject weights are computed by a method described in De Leeuw and Pruzansky (1978). After obtaining initial estimates, a modified Gauss-Newton algorithm is used to obtain estimates for the parameters that optimize the criterion function. The parameters are optimized sequentially as follows:

Optimize the configuration estimates $\Lambda$.
If required, estimate the optimal subject weights $w_{m k}, k=1,\ldots,$ndim, one subject at a time.
Optimize the intercept parameters $\alpha_m$ and the slope parameters $\beta_m$, one subject at a time.
If convergence has not been reached, continue at step 1.

An iteration is defined to be all of the steps 1, 2 and 3. Convergence is assumed when the maximum absolute change in any parameter during an iteration is less than $10^{-4}$ or if there is no change in the criterion function during an iteration.
A modified Gauss-Newton algorithm is used in the estimation of all parameters. This algorithm, which is discussed in detail by Merle and SpÃ¤th (1974), uses iteratively reweighted least squares on a Taylor series linearization of the parameters in $\delta_{ijm}$. During each iteration, the stress weights, which may depend upon the parameters in the model, are assumed to be fixed.

Standardization
Both available models are over-parameterized so that the resulting parameter estimates are not uniquely defined. For example, in the Euclidean model, the columns of the input proximity matrix can be translated or "rotated" (multiplied by an orthonormal matrix), and the resulting stress will not be changed. To eliminate lack of uniqueness due to translation, model estimates for the configuration are centered in both models. No attempt at eliminating the rotation problem is made, but note that rotation invariance is usually not a problem in the models given.

References

Cox, T. F., and M. A. A. Cox (2001), Multidimensional Scaling, Second Edition, Chapman & Hall/CRC, Boca Raton, Florida.
De Leeuw, Jan and Sandra Pruzansky (1978), A new computational method to fit the weighted Euclidean distance model, Psychometrika, 43, 479 - 490.
Merle, G., and H. SpÃ¤th (1974), Computational Experiences with Discrete Lp-Approximation, Computing, 12, 315-321.

See Also:

Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

MultidimensionalScaling.IllDefinedHessianException

A Hessian matrix is ill-defined.

static class

MultidimensionalScaling.NotEnoughPositiveEigenvaluesException

The number of positive eigenvalues of the double-centered distance matrix is too small.
Constructor Summary

Constructors

Constructor

Description

MultidimensionalScaling(double[][][] dissimilarities, int ndim)

Constructor for class MultidimensionalScaling.

MultidimensionalScaling(MultidimensionalScaling mds)

Copy constructor for class MultidimensionalScaling.
Method Summary

Modifier and Type

Method

Description

void

compute()

Performs the multidimensional scaling.

double[][]

getConfiguration()

Returns the configuration matrix.

double[]

getCriterionFunctionWeights()

Returns the criterion function weight for each subject.

double[][][]

getDistances()

Returns the predicted distances.

double[]

getIntercepts()

Returns the intercept for each subject.

double[][][]

getResiduals()

Returns the observation residuals.

double[]

getSlopes()

Returns the slope for each subject.

double[][]

getSubjectWeights()

Returns the subject weights.

double

getWeightedOptimizedCriterionFunctionValue()

Returns the value of the summed optimized stress function.

double[]

getWeightedOptimizedCriterionValues()

Returns the value of the optimized stress function for each subject.

void

setDissimilarityConversion(int conversionType)

Sets the option for conversion of input similarity matrices to dissimilarity matrices.

void

setModel(int model)

Sets the model option parameter.

void

setPrintLevel(int printLevel)

Sets the print level.

void

setStressFormula(int stressFormula)

Sets the stress formula option.

void

setTransformationFormula(int transformationFormula)

Defines the transformation used when computing the criterion function.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- MultidimensionalScaling
  
  public MultidimensionalScaling(double[][][] dissimilarities, int ndim)
  
  Constructor for class MultidimensionalScaling.
  
  Parameters:
  
  dissimilarities - a double, 3-dimensional array containing the dissimilarity or similarity matrices. The array has format nsub by nstim by nstim, where nsub is the number of subjects and nstim is the number of stimuli (or objects) in each (dis)similarity matrix. Each matrix is assumed to be symmetric, and only the strictly upper triangular part is used in the computations.
  Missing values can be indicated either by the standard missing value indicator, Double.NaN, or by a negative entry in array dissimilarities.
  
  ndim - an int scalar, the dimension of the point configuration
  
  Throws:
  
  IllegalArgumentException - if one of the input arguments is not feasible
- MultidimensionalScaling
  
  public MultidimensionalScaling(MultidimensionalScaling mds)
  
  Copy constructor for class MultidimensionalScaling.
  
  Parameters:
  
  mds - the MultidimensionalScaling object to be copied

Method Details

compute

public void compute() throws MultidimensionalScaling.NotEnoughPositiveEigenvaluesException, MultidimensionalScaling.IllDefinedHessianException

Performs the multidimensional scaling.

Throws:

MultidimensionalScaling.NotEnoughPositiveEigenvaluesException - if the number of positive eigenvalues of the average product moment matrix is smaller than the number of columns of the configuration matrix

MultidimensionalScaling.IllDefinedHessianException - if one of the Hessians occurring during the optimization of the stress function is not positive semidefinite

setDissimilarityConversion

public void setDissimilarityConversion(int conversionType)

Sets the option for conversion of input similarity matrices to dissimilarity matrices.

Parameters:

conversionType - an int indicating which type of conversion of the input matrices has to be performed

conversionType	Conversion
0	Input data contain dissimilarities and no conversion is performed.
1	Input data are converted from similarity to dissimilarity data by subtracting each similarity from the largest similarity for the subject.
2	Input data are converted to dissimilarities by reciprocating each similarity.

Default: conversionType = 0

setModel

public void setModel(int model)

Sets the model option parameter.

Parameters:

model - an int indicating which model to use. If model = 0, the Euclidean model is used, if model = 1, the individual differences model is used.
Default: model = 0

setStressFormula

public void setStressFormula(int stressFormula)

Sets the stress formula option.

Parameters:

stressFormula - an int indicating which stress formula to use in the computations. The stress formulas differ in the weighting given to each subject.

stressFormula	Weighting
0	Inverse of within-subject variance of observed dissimilarities about the predicted distances.
1	Inverse of within-subject sum of squared dissimilarities.
2	Inverse of within-subject variance of dissimilarities about the subject mean.

Default: stressFormula = 0

setTransformationFormula

public void setTransformationFormula(int transformationFormula)

Defines the transformation used when computing the criterion function.

Parameters:

transformationFormula - an int indicating which transformation to use on the observed and predicted dissimilarities when computing the criterion function

transformationFormula Transformation

0 Squared distances.

1 Distances (that is, no transformation is performed).

2 Log of the distances.

Default: transformationFormula = 0
setPrintLevel

public void setPrintLevel(int printLevel)

Sets the print level.

Parameters:

printLevel - an int indicating which output is to be printed

printLevel Output

0 No printing is performed.

1 Printing is performed but the output is abbreviated.

2 All printing is performed.

Default: printLevel = 0
getDistances

public double[][][] getDistances()

Returns the predicted distances.

Returns:

a double array of size nsub by nstim by nstim containing the distances as predicted by the estimated parameters in the model
getConfiguration

public double[][] getConfiguration()

Returns the configuration matrix.

Returns:

a double array of size nstim by ndim containing the point configuration obtained by multidimensional scaling
getIntercepts

public double[] getIntercepts()

Returns the intercept for each subject.

Returns:

a double array containing the intercepts for the subjects
getSlopes

public double[] getSlopes()

Returns the slope for each subject.

Returns:

a double array containing the slopes for the subjects
getCriterionFunctionWeights

public double[] getCriterionFunctionWeights()

Returns the criterion function weight for each subject.

Returns:

a double array containing the stress function weights for the subjects
getWeightedOptimizedCriterionValues

public double[] getWeightedOptimizedCriterionValues()

Returns the value of the optimized stress function for each subject.

Returns:

a double array containing the value of the weighted optimized criterion within each subject
getWeightedOptimizedCriterionFunctionValue

public double getWeightedOptimizedCriterionFunctionValue()

Returns the value of the summed optimized stress function.

Returns:

a double scalar containing the value of the weighted optimized criterion function, summed over all subjects
getResiduals

public double[][][] getResiduals()

Returns the observation residuals.

Returns:

a double array of size nsub by nstim by nstim containing the observation residuals for each subject
getSubjectWeights

public double[][] getSubjectWeights()

Returns the subject weights.

Returns:

a double array of size nsub by ndim containing the subject weights

transformationFormula	Transformation
0	Squared distances.
1	Distances (that is, no transformation is performed).
2	Log of the distances.

printLevel	Output
0	No printing is performed.
1	Printing is performed but the output is abbreviated.
2	All printing is performed.

Class MultidimensionalScaling

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

MultidimensionalScaling

MultidimensionalScaling

Method Details

compute

setDissimilarityConversion

setModel

setStressFormula

setTransformationFormula

setPrintLevel

getDistances

getConfiguration

getIntercepts

getSlopes

getCriterionFunctionWeights

getWeightedOptimizedCriterionValues

getWeightedOptimizedCriterionFunctionValue

getResiduals

getSubjectWeights