Given an real matrix , and an integer , compute a factorization . The matrix factors are computed to minimize the Frobenius, or sum of squares, norm of the error matrix: .
IMSL_WEIGHT, floatw[] (Input) An array of length m×n containing the matrix W≥ 0 of weights that will be applied to the entries of A≥ 0 during the solution sweeps. The factorization obtained is FG≅W ∘ A, where the weights are applied element-wise. Default: Weights are not applied, or equivalently, the weights all have value 1.
IMSL_INITIAL_FACTORS, int factors (Input) A flag that signifies if the matrix F is given an input estimate. If factors = 0, start sweeps using
Otherwise, use initial values in f as the matrix F to start the sweeps. Default: factors = 0
IMSL_ITMAX, intitmax (Input) The maximum number of sweeps allowed for alternately solving for G≥ 0, then F≥ 0. Default: itmax = 2 * (m + n + 1)
IMSL_RESIDUAL_ERROR, floaterr (Input) A scalar that will stop the sweeps at the first one satisfying error≤err. Default: err = 0
IMSL_RELATIVE_ERROR, floatrerr (Input) A scalar that will stop the sweeps at the first one satisfying
erroriter‑2‑erroriter‑1≤rerr × erroriter, iter > 2.
This test is made after three values of the error matrix norm have been computed. The sequence {erroriter} is decreasing with increasing values of the iteration counter, iter. If erroriter≥erroriter‑1 occurs, the sweeps stop. Default: rerr = (imsl_f_machine(3))0.4.
IMSL_STOPPING_CRITERION, int *reason (Output) This flag has the value 0,1,2 or 3 depending on which of the following conditions stopped the sweeps:
reason
Description
0
Errors in user input occurred
1
Reached maximum iterations
2
Residual norm is small
3
Relative error convergence
IMSL_NSTEPS_TAKEN, int*nsteps (Output) The last value of the iteration count, , that gives the number of sweeps.
Description
Function imsl_f_nonneg_matrix_factorization computes an approximation , or with weights, W ∘ A ≅ FG; the factors are constrained: . The matrix factors are computed to minimize the Frobenius or sum of squares, norm of the error matrix: .
The algorithm is based on Alternating Least Squares, presented by P. Paatero and U. Tapper, “Positive Matrix Factorization, etc.” Environmetrics, (5), p. 111-126 (1994).
Each constrained least squares problem is solved using imsl_f_nonneg_least_squares. This process alternates between computing the batch of columns of and then the batch of rows of . This constitutes a “sweep.”
There is no restriction on the relative sizes of and . The restrictions on the integer are . When an initial matrix is to be used, instead of an initial , repose the factorization in transposed form , or with weights, AT∘ WT≅ GTFT.
The matrix factors are not unique. In the function, the output rows of are scaled to have sum equal to the value 1. The scaled columns of are sorted so the column sums are non-increasing. This sort order is then applied to the rows of .
Example
Five customers, Beth, Dick, Fred, Joe and Kay make purchases at a convenience store.
Flour
Balloons
Beer
Sugar
Chips
Beth
3
8
1
Dick
2
5
1
Fred
5
1
10
Joe
20
40
2
1
Kay
10
1
10
1
This matrix of customers versus items purchased is approximated by a non-negative matrix factorization, using : . The example is taken from one due to H. Jin and M. Saunders, “Exploring Nonnegative Matrix Factorization,” A Workshop on Algorithms for Massive Data Sets, Stanford University, June 25-28, (2008).
#include <imsl.h>
#include <stdio.h>
#define M 5
#define N 5
#define K 2
int main() {
float a[M][N]= {
{ 0, 3, 8, 0, 1},
{ 0, 2, 5, 1, 0},
{ 5, 0, 1, 10, 0},
{ 0, 20, 40, 2, 1},
{10, 0, 1, 10, 1}
};
float error, f[M*K], g[K*N];
int nsteps, reason;
/* Solve for factors, constraining values to be non-negative.
Get reason for stopping and number of sweeps. */
error = imsl_f_nonneg_matrix_factorization(M, N, K, &a[0][0], f, g,
IMSL_STOPPING_CRITERION, &reason,
IMSL_NSTEPS_TAKEN, &nsteps,
0);
imsl_f_write_matrix("Matrix Factor F", M, K, f, 0);
imsl_f_write_matrix("Matrix Factor G", K, N, g, 0);
printf("\nFrobenius Norm of E=A-F*G is %e\n", error);
printf("Reason for stopping sweeps: %d\n", reason);