nonnegMatrixFactorization

../../_images/OpenMP.png

Given an m×n real matrix A0 , and an integer kmin , compute a factorization A \cong FG. The matrix factors F_{m \times k} \geq 0, G_{k \times n} \geq 0 are computed to minimize the Frobenius, or sum of squares, norm of the error matrix: E = \left\{ e_{i,j} \right\} = A - FG .

Synopsis

nonnegMatrixFactorization (a, f, g)

Required Arguments

float a[[]] (Input)
An array of length m × n containing the A matrix.
float f[[]] (Input/Output)
An array of length m × k containing the F matrix. If initialFactors is used, the sweeps begin using the input values for F_{m \times k} \geq 0 .
float g[[]] (Output)
An array of length k × n containing the G matrix.

Return Value

A scalar containing the Frobenius norm of the error matrix

E:\mathit{error} = \left( \sum_{i,j} e_{i,j}^2 \right)^{1/2}

Optional Arguments

weight, float (Input)

An array of length m × n containing the matrix W ≥ 0 of weights that will be applied to the entries of A ≥ 0 during the solution sweeps. The factorization obtained is FGWA, where the weights are applied element-wise.

Default: Weights are not applied, or equivalently, the weights all have value 1.

initialFactors, int (Input)

A flag that signifies if the matrix F is given an input estimate. If initialFactors = 0, start sweeps using

\begin{split}F = \begin{bmatrix} I_k \\ 0 \end{bmatrix}\end{split}

Otherwise, use initial values in f as the matrix F to start the sweeps.

Default: initialFactors = 0

itmax, int (Input)

The maximum number of sweeps allowed for alternately solving for G ≥ 0, then F ≥ 0.

Default: itmax = 2 * (m + n + 1)

residualError, float (Input)

A scalar that will stop the sweeps at the first one satisfying errorresidualError.

Default: residualError = 0

relativeError, float (Input)

A scalar that will stop the sweeps at the first one satisfying

\mathit{error}_{iter‑2} ‑ \mathit{error}_{iter‑1} \leq \mathit{relativeError} \times \mathit{error}_{iter}, iter > 2.

This test is made after three values of the error matrix norm have been computed. The sequence {error_{iter}} is decreasing with increasing values of the iteration counter, iter. If error_{iter} \geq error_{iter‑1} occurs, the sweeps stop.

Default: relativeError = (machine(3))^{0.4}.

stoppingCriterion (Output)
This flag has the value 0,1,2 or 3 depending on which of the following conditions stopped the sweeps:
stoppingCriterion Description
0 Errors in user input occurred
1 Reached maximum iterations
2 Residual norm is small
3 Relative error convergence
nstepsTaken (Output)
The last value of the iteration count that gives the number of sweeps.

Description

Function nonnegMatrixFactorization computes an approximation A \cong FG , or with weights, WAFG; the factors are constrained: F_{m \times k} \geq 0, G_{k \times n} \geq 0 . The matrix factors F_{m \times k} \geq 0, G_{k \times n} \geq 0 are computed to minimize the Frobenius or sum of squares, norm of the error matrix: E = \left\{ e_{i,j} \right\} = A - FG .

The algorithm is based on Alternating Least Squares, presented by P. Paatero and U. Tapper, “Positive Matrix Factorization, etc.” Environmetrics, (5), p. 111-126 (1994).

Each constrained least squares problem is solved using nonnegLeastSquares. This process alternates between computing the batch of n columns of G and then the batch of m rows of F. This constitutes a “sweep.”

There is no restriction on the relative sizes of m and n . The restrictions on the integer k are 0 < k \leq \min (m,n). When an initial matrix G is to be used, instead of an initial F, repose the factorization in transposed form A^T \cong G^T F^T, or with weights, A^T \circ W^T \cong G^T F^T.

The matrix factors F, G are not unique. In the function, the output rows of G are scaled to have sum equal to the value 1. The scaled columns of F are sorted so the column sums are non-increasing. This sort order is then applied to the rows of G .

Example

Five customers, Beth, Dick, Fred, Joe and Kay make purchases at a convenience store.

  Flour Balloons Beer Sugar Chips
Beth   3 8   1
Dick   2 5 1  
Fred 5   1 10  
Joe   20 40 2 1
Kay 10   1 10 1

This matrix A_{5 \times 5} of customers versus items purchased is approximated by a non-negative matrix factorization, using k = 2 : A \cong FG . The example is taken from one due to H. Jin and M. Saunders, “Exploring Nonnegative Matrix Factorization,” A Workshop on Algorithms for Massive Data Sets, Stanford University, June 25-28, (2008).

from __future__ import print_function
from numpy import *
from pyimsl.math.nonnegMatrixFactorization import nonnegMatrixFactorization
from pyimsl.math.writeMatrix import writeMatrix

m = 5
n = 5
k = 2
a = array([[0, 3, 8, 0, 1],
           [0, 2, 5, 1, 0],
           [5, 0, 1, 10, 0],
           [0, 20, 40, 2, 1],
           [10, 0, 1, 10, 1]])
f = zeros((m, k))
g = zeros((k, n))
reason = []
nsteps = []

# Solve for factors, constraining values to be non-negative.
# Get reason for stopping and number of sweeps.
error = nonnegMatrixFactorization(a, f, g,
                                  stoppingCriterion=reason,
                                  nstepsTaken=nsteps)

writeMatrix("Matrix Factor F", f)
writeMatrix("Matrix Factor G", g)
print("\nFrobenius Norm of E=A-F*G is %e" % error)
print("Reason for stopping sweeps: %d" % reason[0])
print("Number of sweeps taken: %d" % nsteps[0])

Output

Frobenius Norm of E=A-F*G is 3.195012e+00
Reason for stopping sweeps: 3
Number of sweeps taken: 18
 
      Matrix Factor F
             1            2
1        11.98         0.00
2         7.52         0.93
3         0.34        16.60
4        62.97         0.04
5         0.00        21.35
 
                          Matrix Factor G
             1            2            3            4            5
1       0.0000       0.3146       0.6366       0.0309       0.0179
2       0.4049       0.0000       0.0472       0.5191       0.0288