Matrix Storage Modes¶
In this section, the word matrix is used to refer to a mathematical object and the word array is used to refer to its representation as a C data structure. In the following list of array types, the PyIMSL Stat Library functions require input consisting of matrix dimension values and all values for the matrix entries. These values are stored in row-major order in the arrays.
Each function processes the input array and typically returns a “result.” For example, in solving linear algebraic systems, the pointer is to the solution. For general, real eigenvalue problems, the pointer is to the eigenvalues. Normally, the input array values are not changed by the functions.
General Mode¶
A general matrix is a square n × n matrix. The data type of a general array can be float or complex.
Rectangular Mode¶
A rectangular matrix is an m × n matrix. The data type of a rectangular array can be float or complex.
Symmetric Mode¶
A symmetric matrix is a square n × n matrix A, such that \(A^T=A\). (The matrix \(A^T\) is the transpose of A.) The data type of a symmetric array can be float.
Hermitian Mode¶
A Hermitian matrix is a square n × n matrix A, such that
The matrix Ā is the complex conjugate of A, and
is the conjugate transpose of A. For Hermitian matrices \(A^H=A\). The data type of a Hermitian array must be complex.
Sparse Coordinate Storage Format¶
Only the nonzero elements of a sparse matrix need to be communicated to a function. Sparse coordinate storage format stores the value of each matrix entry along with that entry’s row and column index. In Python these matrices are represented using a list or NumPy ndarray object with dimensions 3 by N, where N is the number of nonzero elements in the array. The first dimension (3) is used to describe the row, column and value of the element.
A sparse matrix is passed to functions that accept sparse coordinate format by forming an array of one of these data types. The number of elements in that array will be equal to the number of nonzeros in the sparse matrix.
As an example consider the 6 × 6 matrix:
The matrix A has 15 nonzero elements, and the sparse coordinate representation would be
row | 0 | 1 | 1 | 1 | 2 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 5 | 5 | 5 |
col | 0 | 1 | 2 | 3 | 2 | 0 | 3 | 4 | 0 | 3 | 4 | 5 | 0 | 1 | 5 |
val | 2 | 9 | -3 | -1 | 5 | -2 | -7 | -1 | -1 | -5 | 1 | -3 | -1 | -2 | 6 |
Since this representation does not rely on order, an equivalent form would be
row | 5 | 4 | 3 | 0 | 5 | 1 | 2 | 1 | 4 | 3 | 1 | 4 | 3 | 5 | 4 |
col | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 2 | 3 | 3 | 3 | 4 | 4 | 5 | 5 |
val | -1 | -1 | -2 | 2 | -2 | 9 | 5 | -3 | -5 | -7 | -1 | 1 | -1 | 6 | -3 |
The following variable represents the above sparse matrix A:
a = [[0, 0, 2.0],
[1, 1, 9.0],
[1, 2, -3.0],
[1, 3, -1.0],
[1, 3, -1.0],
[3, 0, -2.0],
[3, 3, -7.0],
[3, 4, -1.0],
[4, 0, -1.0],
[4, 3, -5.0],
[4, 4, 1.0],
[4, 5, -3.0],
[5, 0, -1.0],
[5, 1, -2.0],
[5, 5, 6.0]]
A sparse symmetric or Hermitian matrix is a special case, since it is only necessary to store the diagonal and either the upper or lower triangle. As an example, consider the 5 × 5 linear system:
The Hermitian and symmetric positive definite system solvers in this library expect the diagonal and lower triangle to be specified. The sparse coordinate form for the lower triangle is given by
row | 0 | 1 | 2 | 3 | 1 | 2 | 3 |
col | 0 | 1 | 2 | 3 | 0 | 1 | 2 |
val | (4,0) | (4,0) | (4,0) | (4,0) | (1,1) | (1,1) | (1,1) |
As before, an equivalent form would be
row | 0 | 1 | 1 | 2 | 2 | 3 | 3 |
col | 0 | 0 | 1 | 1 | 2 | 2 | 3 |
val | (4,0) | (1,1) | (4,0) | (1,1) | (4,0) | (1,1) | (4,0) |
The following program fragment will initialize both a
and b
to H.
a[] = [[0, 0, 94.0, 0.0j)],
[1, 1, (4.0, 0.0j)],
[2, 2, (4.0, 0.0j)],
[3, 3, (4.0, 0.0j)],
[1, 0, (1.0, 1.0j)],
[2, 1, (1.0, 1.0j)],
[3, 2, (1.0, 1.0j)]]
There are some important points to note here. H is not symmetric, but rather Hermitian. The functions that accept Hermitian data understand this and operate assuming that
The PyIMSL Stat Library cannot take advantage of the symmetry in matrices that are not positive definite. The implication here is that a symmetric matrix that happens to be indefinite cannot be stored in this compact symmetric form. Rather, both upper and lower triangles must be specified and the sparse general solver called.
Band Storage Format¶
A band matrix is an M ×
N matrix with all of its nonzero elements
“close” to the main diagonal. Specifically, values \(A_{ij}=0\) if
\(i-j\) > nlca
or \(j-i\) > nuca
. The integer m = nlca
+ nuca
+ 1 is the total band width. The diagonals, other than the main
diagonal, are called codiagonals. While any M ×
N matrix is a band
matrix, band storage format is only useful when the number of nonzero
codiagonals is much less than N.
In band storage format, the nlca
lower codiagonals and the nuca
upper codiagonals are stored in the rows of an array of size M ×
N.
The elements are stored in the same column of the array as they are in the
matrix. The values \(A_{ij}\) inside the band width are stored in the
linear array in positions [(i ‑
j +
nuca
+
1) n +
j]
. This results in a row-major, one-dimensional mapping from the
two-dimensional notion of the matrix.
For example, consider the 5 × 5 matrix A with 1 lower and 2 upper codiagonals:
In band storage format, the data would be arranged as
This data would then be stored contiguously, row-major order, in an array of length 20.
As an example, consider the following tridiagonal matrix:
The following declaration will store this matrix in band storage format:
a[] = [0.0, 1.0, 2.0, 3.0, 4.0,
10.0, 20.0, 30.0, 40.0, 50.0,
5.0, 6.0, 7.0, 8.0, 0.0]
As in the sparse coordinate representation, there is a space saving symmetric version of band storage. As an example, look at the following 5 × 5 symmetric problem:
In band symmetric storage format, the data would be arranged as
The following Hermitian example illustrates the procedure:
The following program fragments would store H in h
, using band
symmetric storage format.
h = [(0.0, 0.0j), (0.0, 0.0j), (1.0, 1.0j), (1.0, 1.0j), (1.0, 1.0j),
(0.0, 0.0j), (1.0, 1.0j), (1.0, 1.0j), (1.0, 1.0j), (1.0, 1.0j),
(8.0, 0.0j), (8.0, 0.0j), (8.0, 0.0j), (8.0, 0.0j), (8.0, 0.0j)]
Choosing Between Banded and Coordinate Forms¶
It is clear that any matrix can be stored in either sparse coordinate or band
format. The choice depends on the sparsity pattern of the matrix. A matrix
with all nonzero data stored in bands close to the main diagonal would
probably be a good candidate for band format. If nonzero information is
scattered more or less uniformly through the matrix, sparse coordinate format
is the best choice. As extreme examples, consider the following two cases:
(1) an n ×
n matrix with all elements on the main diagonal and the
\((0,n-1)\) and \((n-1,0)\) entries nonzero. The sparse coordinate
vector would be \(n+2\) units long. An array of length \(n(2n-1)\)
would be required to store the band representation, nearly twice as much
storage as a dense solver might require. (2) a tridiagonal matrix with all
diagonal, superdiagonal and subdiagonal entries nonzero. In band format, an
array of length \(3n\) is needed. In sparse coordinate, format a vector
of length \(3n-2\) is required. But the problem is that, for example, for
float precision on a 32-bit machine, each of those \(3n-2\) units in
coordinate format requires three times as much storage as any of the
\(3n\) units needed for band representation. This is due to carrying the
row and column indices in coordinate form. Band storage evades this
requirement by being essentially an ordered list, and defining location in
the original matrix by position in the list.
Compressed Sparse Column (CSC) Format¶
Functions that accept data in coordinate format can also accept data stored
in the format described in the
Users’ Guide for the Harwell-Boeing Sparse Matrix Collection.
The scheme is column oriented, with each column held as a sparse vector,
represented by a list of the row indices of the entries in an integer array
(“rowind
” below) and a list of the corresponding values in a
separate float or complex array (“values
” below). Data for each
column are stored consecutively and the columns are stored in order. A third
array (“colptr
” below) indicates the location in array
“values
” in which to place the first nonzero value of each
succeeding column of the original sparse matrix. So colptr[i]
contains
the index of the first free location in array “values
” in which to
place the values from the \(\text{i}^{th}\) column of the original
sparse matrix. In other words, values[colptr[i]]
holds the first nonzero
value of the i
-th column of the original sparse matrix. Only entries in
the lower triangle and diagonal are stored for symmetric and Hermitian
matrices. All arrays are based at zero, which is in contrast to the
Harwell-Boeing test suite’s one-based arrays.
As in the Harwell-Boeing user guide (link above), the storage scheme is illustrated with the following example: The 5 × 5 matrix
would be stored in the arrays colptr
(location of first entry),
rowind
(row indices), and values
(nonzero entries) as follows:
Subscripts | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
Colptr |
0 | 3 | 5 | 7 | 9 | 11 | |||||
Rowind |
0 | 2 | 4 | 0 | 3 | 1 | 4 | 0 | 3 | 1 | 4 |
Values |
1 | 2 | 5 | -3 | 4 | -2 | -5 | -1 | -4 | 3 | 6 |
The following program fragment shows the relation between CSC storage format and coordinate representation:
k = 0
for I in range(0,n):
start = colptr[i]
stop = colptr[i+1]
for j in range(start,stop):
a[0][k] = rowind[j]
a[1][k] = i
a[2][k] = values[j]
k = k+1
nz =k