nonnegMatrixFactorization¶

Given an m×n real matrix A≥0 , and an integer k≤min , compute a factorization A \cong FG. The matrix factors F_{m \times k} \geq 0, G_{k \times n} \geq 0 are computed to minimize the Frobenius, or sum of squares, norm of the error matrix: E = \left\{ e_{i,j} \right\} = A - FG .
Synopsis¶
nonnegMatrixFactorization (a, f, g)
Required Arguments¶
- float
a[[]]
(Input) - An array of length
m
×n
containing the A matrix. - float
f[[]]
(Input/Output) - An array of length
m
×k
containing the F matrix. IfinitialFactors
is used, the sweeps begin using the input values for F_{m \times k} \geq 0 . - float
g[[]]
(Output) - An array of length
k
×n
containing the G matrix.
Return Value¶
A scalar containing the Frobenius norm of the error matrix
Optional Arguments¶
weight
, float (Input)An array of length
m
×n
containing the matrix W ≥ 0 of weights that will be applied to the entries of A ≥ 0 during the solution sweeps. The factorization obtained is FG ≅W ∘ A, where the weights are applied element-wise.Default: Weights are not applied, or equivalently, the weights all have value 1.
initialFactors
, int (Input)A flag that signifies if the matrix F is given an input estimate. If
initialFactors
= 0, start sweeps using\begin{split}F = \begin{bmatrix} I_k \\ 0 \end{bmatrix}\end{split}Otherwise, use initial values in
f
as the matrix F to start the sweeps.Default:
initialFactors
= 0itmax
, int (Input)The maximum number of sweeps allowed for alternately solving for G ≥ 0, then F ≥ 0.
Default:
itmax
=2 * (m + n + 1)
residualError
, float (Input)A scalar that will stop the sweeps at the first one satisfying error ≤ residualError.
Default:
residualError
= 0relativeError
, float (Input)A scalar that will stop the sweeps at the first one satisfying
\mathit{error}_{iter‑2} ‑ \mathit{error}_{iter‑1} \leq \mathit{relativeError} \times \mathit{error}_{iter}, iter > 2.This test is made after three values of the error matrix norm have been computed. The sequence {error_{iter}} is decreasing with increasing values of the iteration counter, iter. If error_{iter} \geq error_{iter‑1} occurs, the sweeps stop.
Default:
relativeError
= (machine
(3))^{0.4}.stoppingCriterion
(Output)- This flag has the value 0,1,2 or 3 depending on which of the following conditions stopped the sweeps:
stoppingCriterion | Description |
---|---|
0 | Errors in user input occurred |
1 | Reached maximum iterations |
2 | Residual norm is small |
3 | Relative error convergence |
nstepsTaken
(Output)- The last value of the iteration count that gives the number of sweeps.
Description¶
Function nonnegMatrixFactorization
computes an approximation
A \cong FG , or with weights, W ∘ A ≅ FG; the factors are
constrained: F_{m \times k} \geq 0, G_{k \times n} \geq 0 . The
matrix factors F_{m \times k} \geq 0, G_{k \times n} \geq 0 are
computed to minimize the Frobenius or sum of squares, norm of the error
matrix: E = \left\{ e_{i,j} \right\} = A - FG .
The algorithm is based on Alternating Least Squares, presented by P. Paatero and U. Tapper, “Positive Matrix Factorization, etc.” Environmetrics, (5), p. 111-126 (1994).
Each constrained least squares problem is solved using nonnegLeastSquares. This process alternates between computing the batch of n columns of G and then the batch of m rows of F. This constitutes a “sweep.”
There is no restriction on the relative sizes of m and n . The restrictions on the integer k are 0 < k \leq \min (m,n). When an initial matrix G is to be used, instead of an initial F, repose the factorization in transposed form A^T \cong G^T F^T, or with weights, A^T \circ W^T \cong G^T F^T.
The matrix factors F, G are not unique. In the function, the output rows of G are scaled to have sum equal to the value 1. The scaled columns of F are sorted so the column sums are non-increasing. This sort order is then applied to the rows of G .
Example¶
Five customers, Beth, Dick, Fred, Joe and Kay make purchases at a convenience store.
Flour | Balloons | Beer | Sugar | Chips | |
---|---|---|---|---|---|
Beth | 3 | 8 | 1 | ||
Dick | 2 | 5 | 1 | ||
Fred | 5 | 1 | 10 | ||
Joe | 20 | 40 | 2 | 1 | |
Kay | 10 | 1 | 10 | 1 |
This matrix A_{5 \times 5} of customers versus items purchased is approximated by a non-negative matrix factorization, using k = 2 : A \cong FG . The example is taken from one due to H. Jin and M. Saunders, “Exploring Nonnegative Matrix Factorization,” A Workshop on Algorithms for Massive Data Sets, Stanford University, June 25-28, (2008).
from __future__ import print_function
from numpy import *
from pyimsl.math.nonnegMatrixFactorization import nonnegMatrixFactorization
from pyimsl.math.writeMatrix import writeMatrix
m = 5
n = 5
k = 2
a = array([[0, 3, 8, 0, 1],
[0, 2, 5, 1, 0],
[5, 0, 1, 10, 0],
[0, 20, 40, 2, 1],
[10, 0, 1, 10, 1]])
f = zeros((m, k))
g = zeros((k, n))
reason = []
nsteps = []
# Solve for factors, constraining values to be non-negative.
# Get reason for stopping and number of sweeps.
error = nonnegMatrixFactorization(a, f, g,
stoppingCriterion=reason,
nstepsTaken=nsteps)
writeMatrix("Matrix Factor F", f)
writeMatrix("Matrix Factor G", g)
print("\nFrobenius Norm of E=A-F*G is %e" % error)
print("Reason for stopping sweeps: %d" % reason[0])
print("Number of sweeps taken: %d" % nsteps[0])
Output¶
Frobenius Norm of E=A-F*G is 3.195012e+00
Reason for stopping sweeps: 3
Number of sweeps taken: 18
Matrix Factor F
1 2
1 11.98 0.00
2 7.52 0.93
3 0.34 16.60
4 62.97 0.04
5 0.00 21.35
Matrix Factor G
1 2 3 4 5
1 0.0000 0.3146 0.6366 0.0309 0.0179
2 0.4049 0.0000 0.0472 0.5191 0.0288