superluSmp¶
Computes the LU factorization of a general sparse matrix by a left-looking column method using OpenMP parallelism, and solves the real sparse linear system of equations \(Ax = b\) .
Synopsis¶
superluSmp (a, b)
Required Arguments¶
a[]
(Input)- An array of length
nz
containing the location and value of each nonzero entry in the matrix. See the explanation of the sparse element array in the section Matrix Storage Modes in the “Introduction” chapter of this manual. - float
b[]
(Input) - An array of length
n
containing the right-hand side.
Return Value¶
The solution x of the sparse linear system \(Ax = b\). If no solution was
computed, then None
is returned.
Optional Arguments¶
equilibrate
(Input)Specifies if the input matrix A should be equilibrated before factorization.
equilibrate
Description 0 Do not equilibrate A before factorization 1 Equilibrate A before factorization. Default:
equilibrate
= 0.columnOrderingMethod
(Input)The column ordering method used to preserve sparsity prior to the factorization process. Select the ordering method by setting
columnOrderingMethod
to one of the following:columnOrderingMethod
Description NATURAL
Natural ordering, i.e.the column ordering of the input matrix. MMD_ATA
Minimum degree ordering on the structure of \(A^TA\). MMD_AT_PLUS_A
Minimum degree ordering on the structure of \(A^TA\). COLAMD
Column approximate minimum degree ordering. PERMC
Use ordering given in permutation vector colpermVector
, which is input by the user through the optional argumentcolpermVector
. VectorcolpermVector
is a permutation of the numbers 0,1,…,n
-1.Default:
columnOrderingMethod
=COLAMD
.colpermVector
, (Input)- Array of length
n
that defines the permutation matrix \(P_c\) before postordering. This argument is required ifcolumnOrderingMethod
=PERMC
is used. Otherwise, it is ignored. transpose
(Input)Indicates if the transposed problem \(A^Tx = b\) is to be solved. This option can be used in conjunction with either of the options that supply the factorization.
transpose
Description 0 Solve \(Ax = b\). 1 Solve \(A^Tx = b\). Default:
transpose
= 0.iterativeRefinement
(Input)Indicates if iterative refinement is desired.
iterativeRefinement
Description 0 No iterative refinement. 1 Do iterative refinement. Default:
iterativeRefinement
= 1.factorSolve
, int (Input)Indicates if the LU factorization, the solution of a linear system, or both are to be computed.
factorSolve
Description 0 Compute the LU factorization of the input matrix A and solve the system \(Ax = b\). 1 Only compute the LU factorization of the input matrix and return.
The LU factorization is returned via the optional argument
returnSparseLuFactor
.Input argument
b
is ignored.2 Only solve \(Ax = b\) given the LU factorization of A.
The LU factorization of A must be supplied via the optional argument
supplySparseLuFactor
.Input argument
a
is ignored unless iterative refinement, computation of the condition number, or computation of the reciprocal pivot growth factor is required.Default:
factorSolve
= 0.diagPivotThresh
, float (Input)Specifies the threshold used for a diagonal entry to be an acceptable pivot, 0.0 ≤
diagPivotThresh
≤ 1.0.Default:
diagPivotThresh
= 1.0.snodePrediction
, int (Input)Indicates which scheme is used to predict the number of nonzeros in the L supernodes.
snodePrediction
Description 0 Use static scheme for the prediction of the number of nonzeros in the L supernodes. 1 Use dynamic scheme for the prediction of the number of nonzeros in the L supernodes. Default:
snodePrediction
= 0.performanceTuning
, int (Input)- Array of length 8 containing parameters that allow the user to tune the
performance of the matrix factorization algorithm. The elements
performanceTuning[i]
must be positive fori
= 0,…,4 and different from zero fori
= 5,6,7.
i |
Description of performanceTuning[i] |
---|---|
0 | The panel size. Default: |
1 | The relaxation parameter to control supernode amalgamation. Default: |
2 | The maximum allowable size for a supernode. Default: |
3 | The minimum row dimension to be used for 2D blocking. Default: |
4 | The minimum column dimension to be used for 2D blocking. Default: |
5 | The size of the array This element of array Default: |
6 | The size of the arrays Default: |
7 | The size of the array Default: |
cscFormat
, inthbColPtr[]
, inthbRowInd[]
, floathbValues[]
(Input)- Accept the coefficient matrix in compressed sparse column (CSC) format, as described in the Compressed Sparse Column (CSC) Format section of the “Introduction” chapter of this manual.
supplySparseLuFactor
, structure (Input)- A structure containing the LU factors of the input matrix computed with
the
returnSparseLuFactor
option. returnSparseLuFactor
(Output)- A structure containing the LU factorization of the input matrix.
condition
(Output)- The estimate of the reciprocal condition number of matrix
a
after equilibration (if done). pivotGrowthFactor
(Output)The reciprocal pivot growth factor
\[\min_j \left\{ \|\left(P_rD_rAD_cP_c\right)_j\|_{\infty} / \| U_j \|_{\infty} \right\} \cdot\]If
pivotGrowthFactor
is much less than 1, the stability of the LU factorization could be poor.forwardErrorBound
(Output)- The estimated forward error bound for the solution vector x. This
option requires argument
iterativeRefinement
set to 1. backwardError
(Output)- The componentwise relative backward error of the solution vector x.
This option requires argument
iterativeRefinement
set to 1.
Description¶
The steps superluSmp
uses to solve linear systems are identical to the
steps described in the documentation of the serial version superlu
.
Function superluSmp
uses a supernodal storage scheme for the LU
factorization of matrix A. In contrast to the sequential version, the
consecutive columns and supernodes of the L and U factors might not be
stored contiguously in memory. Thus, in addition to the pointers to the
beginning of each column or supernode, also pointers to the end of each
column or supernode are needed. The factorization is contained in a returned
structure and its two sub-structures.
In contrast to the sequential version, the numerical factorization phase of
the LU decomposition is parallelized. Since a dynamic memory expansion as
in the serial case is difficult to implement for the parallel code, the
estimated sizes of array rowind
for the L and of arrays rowind
and
nzval
for the U factor must be predetermined by the user via elements
6 and 7 of the performance tuning array performanceTuning
.
In order to ensure that the columns of each L supernode are stored
contiguously in memory, a static or dynamic prediction scheme for the size of
the L supernodes can be used. The static version, which function
superluSmp
uses by default, exploits the observation that for any row
permutation P in \(PA = LU\), the nonzero structure of L is contained
in that of the Householder matrix H from the Householder sparse QR
factorization \(A = QR\). Furthermore, it can be shown that each
fundamental supernode in L is always contained in a fundamental supernode
of H. Therefore, the storage requirement for the L supernodes and array
nzval
in the L factor respectively can be estimated and allocated prior
to the factorization based on the size of the H supernodes. The algorithm
used to compute the supernode partition and the size of the supernodes in H
is almost linear in the number of nonzeros of matrix A.
In practice, the above static prediction scheme is quite tight for most
problems. However, if the number of nonzeros in H greatly exceeds the
number of nonzeros in L, the user can try a dynamic prediction scheme by
setting optional argument snodePrediction
to 1. This scheme still uses
the supernode partition in H, but dynamically searches the supernodal
graph of L to obtain a much tighter upper bound for the required storage.
Use of the dynamic scheme requires the user to define the size of array
nzval
in the L factor via element 5 of the performance tuning array
performanceTuning
.
For a complete description of the parallel algorithm, see Demmel et al. (1999c).
Copyright (c) 2003, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from U.S. Dept. of Energy).
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
(3) Neither the name of Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Examples¶
Example 1¶
The LU factorization of the sparse 6×6 matrix
is computed.
Let \(y = (1,2,3,4,5,6)^T\), so that \(b_1 := Ay = (10,7,45,33,-34,31)^T\) and \(b_2 := A^Ty = (-9,8,39,13,1,21)^T\).
The LU factorization of A is used to solve the sparse linear systems \(Ax = b_1\) and \(A^Tx = b_2\).
from numpy import *
from pyimsl.math.superluSmp import superluSmp
from pyimsl.math.writeMatrix import writeMatrix
a = [[0, 0, 10.0],
[1, 1, 10.0],
[1, 2, -3.0],
[1, 3, -1.0],
[2, 2, 15.0],
[3, 0, -2.0],
[3, 3, 10.0],
[3, 4, -1.0],
[4, 0, -1.0],
[4, 3, -5.0],
[4, 4, 1.0],
[4, 5, -3.0],
[5, 0, -1.0],
[5, 1, -2.0],
[5, 5, 6.0]]
b1 = [10.0, 7.0, 45.0, 33.0, -34.0, 31.0]
b2 = [-9.0, 8.0, 39.0, 13.0, 1.0, 21.0]
x = superluSmp(a, b1)
writeMatrix("solution to A*x = b1", x)
x = superluSmp(a, b2, transpose=True)
writeMatrix("solution to A^T*x = b2", x)
Output¶
solution to A*x = b1
1 2 3 4 5 6
1 2 3 4 5 6
solution to A^T*x = b2
1 2 3 4 5 6
1 2 3 4 5 6
Example 2¶
This example uses the matrix \(A = E(1000,10)\) to show how the LU factorization of A can be used to solve a linear system with the same coefficient matrix A but different right-hand side. Maximum absolute errors are printed.
from __future__ import print_function
from numpy import *
from pyimsl.math.ctime import ctime
from pyimsl.math.generateTestCoordinate import generateTestCoordinate
from pyimsl.math.superluSmp import superluSmp
from pyimsl.math.superluSmpFactorFree import superluSmpFactorFree
from pyimsl.math.matMulRectCoordinate import matMulRectCoordinate
from pyimsl.math.vectorNorm import vectorNorm
n = 1000
c = 10
# Get the coefficient matrix
a = generateTestCoordinate(n, c)
# Set two different predetermined solutions
mod_five = empty((n), dtype=double)
mod_ten = empty((n), dtype=double)
for i in range(0, n, 1):
mod_five[i] = (float)(i % 5)
mod_ten[i] = (float)(i % 10)
# Choose b so that x will approximate mod_five
b = matMulRectCoordinate("A*x",
aMatrix={'nrowa': n, 'ncola': n, 'a': a},
xVector=mod_five)
# Time the factor/solve
time_factor_solve = ctime()
lu_factor = []
x = superluSmp(a, b, returnSparseLuFactor=lu_factor)
time_factor_solve = ctime() - time_factor_solve
# Compute max abolute error
index = []
error_factor_solve = vectorNorm(x, secondVector=mod_five, infNorm=index)
# Get new right hand side -- b = A * mod_ten
b = matMulRectCoordinate("A*x",
aMatrix={'nrowa': n, 'ncola': n, 'a': a},
xVector=mod_ten)
# Use the previously computed factorization to solve Ax = b
time_solve = ctime()
x = superluSmp(a, b,
supplySparseLuFactor=lu_factor[0],
factorSolve=2)
time_solve = ctime() - time_solve
index = []
error_solve = vectorNorm(x,
secondVector=mod_ten,
infNorm=index)
# Print errors and ratio of execution times
print("absolute error (factor/solve) = %e" % error_factor_solve)
print("absolute error (solve) = %e" % error_solve)
superluSmpFactorFree(lu_factor[0])
Output¶
absolute error (factor/solve) = 5.329071e-14
absolute error (solve) = 6.394885e-14
Warning Errors¶
IMSL_ILL_CONDITIONED |
The input matrix is too ill-conditioned.
An estimate of the reciprocal of its
\(L_1\) condition number is
“rcond ” = #. The solution might
not be accurate. |
Fatal Errors¶
IMSL_SINGULAR_MATRIX |
The input matrix is singular. |