Matrix Storage Modes

In this section, the word matrix is used to refer to a mathematical object and the word array is used to refer to its representation as a C data structure. In the following list of array types, the PyIMSL Stat Library functions require input consisting of matrix dimension values and all values for the matrix entries. These values are stored in row-major order in the arrays.

Each function processes the input array and typically returns a “result.” For example, in solving linear algebraic systems, the pointer is to the solution. For general, real eigenvalue problems, the pointer is to the eigenvalues. Normally, the input array values are not changed by the functions.

General Mode

A general matrix is a square n × n matrix. The data type of a general array can be float or complex.

Rectangular Mode

A rectangular matrix is an m × n matrix. The data type of a rectangular array can be float or complex.

Symmetric Mode

A symmetric matrix is a square n × n matrix A, such that \(A^T=A\). (The matrix \(A^T\) is the transpose of A.) The data type of a symmetric array can be float.

Hermitian Mode

A Hermitian matrix is a square n × n matrix A, such that

\[A^H = A^{-T} = A\]

The matrix Ā is the complex conjugate of A, and

\[A^H \equiv A^{-T}\]

is the conjugate transpose of A. For Hermitian matrices \(A^H=A\). The data type of a Hermitian array must be complex.

Sparse Coordinate Storage Format

Only the nonzero elements of a sparse matrix need to be communicated to a function. Sparse coordinate storage format stores the value of each matrix entry along with that entry’s row and column index. In Python these matrices are represented using a list or NumPy ndarray object with dimensions 3 by N, where N is the number of nonzero elements in the array. The first dimension (3) is used to describe the row, column and value of the element.

A sparse matrix is passed to functions that accept sparse coordinate format by forming an array of one of these data types. The number of elements in that array will be equal to the number of nonzeros in the sparse matrix.

As an example consider the 6 × 6 matrix:

\[\begin{split}A = \begin{bmatrix} 2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 9 & -3 & -1 & 0 & 0 \\ 0 & 0 & 5 & 0 & 0 & 0 \\ -2 & 0 & 0 & -7 & -1 & 0 \\ -1 & 0 & 0 & -5 & 1 & -3 \\ -1 & -2 & 0 & 0 & 0 & 6 \\ \end{bmatrix}\end{split}\]

The matrix A has 15 nonzero elements, and the sparse coordinate representation would be

row 0 1 1 1 2 3 3 3 4 4 4 4 5 5 5
col 0 1 2 3 2 0 3 4 0 3 4 5 0 1 5
val 2 9 -3 -1 5 -2 -7 -1 -1 -5 1 -3 -1 -2 6

Since this representation does not rely on order, an equivalent form would be

row 5 4 3 0 5 1 2 1 4 3 1 4 3 5 4
col 0 0 0 0 1 1 2 2 3 3 3 4 4 5 5
val -1 -1 -2 2 -2 9 5 -3 -5 -7 -1 1 -1 6 -3

The following variable represents the above sparse matrix A:

a = [[0, 0, 2.0],
     [1, 1, 9.0],
     [1, 2, -3.0],
     [1, 3, -1.0],
     [1, 3, -1.0],
     [3, 0, -2.0],
     [3, 3, -7.0],
     [3, 4, -1.0],
     [4, 0, -1.0],
     [4, 3, -5.0],
     [4, 4, 1.0],
     [4, 5, -3.0],
     [5, 0, -1.0],
     [5, 1, -2.0],
     [5, 5, 6.0]]

A sparse symmetric or Hermitian matrix is a special case, since it is only necessary to store the diagonal and either the upper or lower triangle. As an example, consider the 5 × 5 linear system:

\[\begin{split}H = \begin{bmatrix} (4,0) & (1,-1) & 0 & 0 \\ (1,1) & (4,0) & (1,-1) & 0 \\ 0 & (1,1) & (4,0) & (1,-1) \\ 0 & 0 & (1,1) & (4,0) \\ \end{bmatrix}\end{split}\]

The Hermitian and symmetric positive definite system solvers in this library expect the diagonal and lower triangle to be specified. The sparse coordinate form for the lower triangle is given by

row 0 1 2 3 1 2 3
col 0 1 2 3 0 1 2
val (4,0) (4,0) (4,0) (4,0) (1,1) (1,1) (1,1)

As before, an equivalent form would be

row 0 1 1 2 2 3 3
col 0 0 1 1 2 2 3
val (4,0) (1,1) (4,0) (1,1) (4,0) (1,1) (4,0)

The following program fragment will initialize both a and b to H.

a[] = [[0, 0, 94.0, 0.0j)],
       [1, 1, (4.0, 0.0j)],
       [2, 2, (4.0, 0.0j)],
       [3, 3, (4.0, 0.0j)],
       [1, 0, (1.0, 1.0j)],
       [2, 1, (1.0, 1.0j)],
       [3, 2, (1.0, 1.0j)]]

There are some important points to note here. H is not symmetric, but rather Hermitian. The functions that accept Hermitian data understand this and operate assuming that

\[h_{ij} = \overline{h}_{ij}\]

The PyIMSL Stat Library cannot take advantage of the symmetry in matrices that are not positive definite. The implication here is that a symmetric matrix that happens to be indefinite cannot be stored in this compact symmetric form. Rather, both upper and lower triangles must be specified and the sparse general solver called.

Band Storage Format

A band matrix is an M × N matrix with all of its nonzero elements “close” to the main diagonal. Specifically, values \(A_{ij}=0\) if \(i-j\) > nlca or \(j-i\) > nuca. The integer m = nlca + nuca + 1 is the total band width. The diagonals, other than the main diagonal, are called codiagonals. While any M × N matrix is a band matrix, band storage format is only useful when the number of nonzero codiagonals is much less than N.

In band storage format, the nlca lower codiagonals and the nuca upper codiagonals are stored in the rows of an array of size M × N. The elements are stored in the same column of the array as they are in the matrix. The values \(A_{ij}\) inside the band width are stored in the linear array in positions [(i j + nuca + 1) n + j]. This results in a row-major, one-dimensional mapping from the two-dimensional notion of the matrix.

For example, consider the 5 × 5 matrix A with 1 lower and 2 upper codiagonals:

\[\begin{split}A = \begin{bmatrix} A_{0,0} & A_{0,1} & A_{0,2} & 0 & 0 \\ A_{1,0} & A_{1,1} & A_{1,2} & A_{1,3} & 0 \\ 0 & A_{2,1} & A_{2,2} & A_{2,3} & A_{2,4} \\ 0 & 0 & A_{3,2} & A_{3,3} & A_{3,4} \\ 0 & 0 & 0 & A_{4,3} & A_{4,4} \\ \end{bmatrix}\end{split}\]

In band storage format, the data would be arranged as

\[\begin{split}\begin{bmatrix} 0 & 0 & A_{0,2} & A_{1,3} & A_{2,4} \\ 0 & A_{0,1} & A_{1,2} & A_{2,3} & A_{3,4} \\ A_{0,0} & A_{1,1} & A_{2,2} & A_{3,3} & A_{4,4} \\ A_{1,0} & A_{2,1} & A_{3,2} & A_{4,3} & 0 \\ \end{bmatrix}\end{split}\]

This data would then be stored contiguously, row-major order, in an array of length 20.

As an example, consider the following tridiagonal matrix:

\[\begin{split}A = \begin{bmatrix} 10 & 1 & 0 & 0 & 0 \\ 5 & 20 & 2 & 0 & 0 \\ 0 & 6 & 30 & 3 & 0 \\ 0 & 0 & 7 & 40 & 4 \\ 0 & 0 & 0 & 8 & 50 \\ \end{bmatrix}\end{split}\]

The following declaration will store this matrix in band storage format:

a[] = [0.0, 1.0, 2.0, 3.0, 4.0,
       10.0, 20.0, 30.0, 40.0, 50.0,
       5.0, 6.0, 7.0, 8.0, 0.0]

As in the sparse coordinate representation, there is a space saving symmetric version of band storage. As an example, look at the following 5 × 5 symmetric problem:

\[\begin{split}A = \begin{bmatrix} A_{0,0} & A_{0,1} & A_{0,2} & 0 & 0 \\ A_{0,1} & A_{1,1} & A_{1,2} & A_{1,3} & 0 \\ A_{0,2} & A_{1,2} & A_{2,2} & A_{2,3} & A_{2,4} \\ 0 & A_{1,3} & A_{2,3} & A_{3,3} & A_{3,4} \\ 0 & 0 & A_{2,4} & A_{3,4} & A_{4,4} \\ \end{bmatrix}\end{split}\]

In band symmetric storage format, the data would be arranged as

\[\begin{split}\begin{bmatrix} 0 & 0 & A_{0,2} & A_{1,3} & A_{2,4} \\ 0 & A_{0,1} & A_{1,2} & A_{2,3} & A_{3,4} \\ A_{0,0} & A_{1,1} & A_{2,2} & A_{3,3} & A_{4,4} \\ \end{bmatrix}\end{split}\]

The following Hermitian example illustrates the procedure:

\[\begin{split}H = \begin{bmatrix} (8,0) & (1,1) & (1,1) & 0 & 0 \\ (1,-1) & (8,0) & (1,1) & (1,1) & 0 \\ (1,-1) & (1,-1) & (8,0) & (1,1) & (1,1) \\ 0 & (1,-1) & (1,-1) & (8,0) & (1,1) \\ 0 & 0 & (1,-1) & (1,-1) & (8,0) \\ \end{bmatrix}\end{split}\]

The following program fragments would store H in h, using band symmetric storage format.

h = [(0.0, 0.0j), (0.0, 0.0j), (1.0, 1.0j), (1.0, 1.0j), (1.0, 1.0j),
     (0.0, 0.0j), (1.0, 1.0j), (1.0, 1.0j), (1.0, 1.0j), (1.0, 1.0j),
     (8.0, 0.0j), (8.0, 0.0j), (8.0, 0.0j), (8.0, 0.0j), (8.0, 0.0j)]

Choosing Between Banded and Coordinate Forms

It is clear that any matrix can be stored in either sparse coordinate or band format. The choice depends on the sparsity pattern of the matrix. A matrix with all nonzero data stored in bands close to the main diagonal would probably be a good candidate for band format. If nonzero information is scattered more or less uniformly through the matrix, sparse coordinate format is the best choice. As extreme examples, consider the following two cases: (1) an n × n matrix with all elements on the main diagonal and the \((0,n-1)\) and \((n-1,0)\) entries nonzero. The sparse coordinate vector would be \(n+2\) units long. An array of length \(n(2n-1)\) would be required to store the band representation, nearly twice as much storage as a dense solver might require. (2) a tridiagonal matrix with all diagonal, superdiagonal and subdiagonal entries nonzero. In band format, an array of length \(3n\) is needed. In sparse coordinate, format a vector of length \(3n-2\) is required. But the problem is that, for example, for float precision on a 32-bit machine, each of those \(3n-2\) units in coordinate format requires three times as much storage as any of the \(3n\) units needed for band representation. This is due to carrying the row and column indices in coordinate form. Band storage evades this requirement by being essentially an ordered list, and defining location in the original matrix by position in the list.

Compressed Sparse Column (CSC) Format

Functions that accept data in coordinate format can also accept data stored in the format described in the Users’ Guide for the Harwell-Boeing Sparse Matrix Collection. The scheme is column oriented, with each column held as a sparse vector, represented by a list of the row indices of the entries in an integer array (“rowind” below) and a list of the corresponding values in a separate float or complex array (“values” below). Data for each column are stored consecutively and the columns are stored in order. A third array (“colptr” below) indicates the location in array “values” in which to place the first nonzero value of each succeeding column of the original sparse matrix. So colptr[i] contains the index of the first free location in array “values” in which to place the values from the \(\text{i}^{th}\) column of the original sparse matrix. In other words, values[colptr[i]] holds the first nonzero value of the i-th column of the original sparse matrix. Only entries in the lower triangle and diagonal are stored for symmetric and Hermitian matrices. All arrays are based at zero, which is in contrast to the Harwell-Boeing test suite’s one-based arrays.

As in the Harwell-Boeing user guide (link above), the storage scheme is illustrated with the following example: The 5 × 5 matrix

\[\begin{split}\begin{bmatrix} 1 & -3 & 0 & -1 & 0 \\ 0 & 0 & -2 & 0 & 3 \\ 2 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & -4 & 0 \\ 5 & 0 & -5 & 0 & 6 \\ \end{bmatrix}\end{split}\]

would be stored in the arrays colptr (location of first entry), rowind (row indices), and values (nonzero entries) as follows:

Subscripts 0 1 2 3 4 5 6 7 8 9 10
Colptr 0 3 5 7 9 11          
Rowind 0 2 4 0 3 1 4 0 3 1 4
Values 1 2 5 -3 4 -2 -5 -1 -4 3 6

The following program fragment shows the relation between CSC storage format and coordinate representation:

k = 0
for I in range(0,n):
    start = colptr[i]
    stop = colptr[i+1]
    for j in range(start,stop):
        a[0][k] = rowind[j]
        a[1][k] = i
        a[2][k] = values[j]
        k = k+1
nz =k