superlu_smp
Computes the LU factorization of a general sparse matrix by a left-looking column method using OpenMP parallelism, and solves the real sparse linear system of equations Ax = b .
Synopsis
#include <imsl.h>
float *imsl_f_superlu_smp (int n, int nz, Imsl_f_sparse_elem a[], float b[],…,0)
void imsl_f_superlu_smp_factor_free (Imsl_f_super_lu_smp_factor *factor)
The type double functions are imsl_d_superlu_smp and imsl_d_superlu_smp_factor_free.
Required Arguments
int n (Input)
The order of the input matrix.
int nz (Input)
Number of nonzeros in the matrix.
Imsl_f_sparse_elem a[] (Input)
An array of length
nz containing the location and value of each nonzero entry in the matrix. See the explanation of the
Imsl_f_sparse_elem structure in the section
Matrix Storage Modes in the “Introduction” chapter of this manual.
float b[] (Input)
An array of length n containing the right-hand side.
Return Value
A pointer to the solution
x of the sparse linear system
Ax = b. To release this space, use
imsl_free. If no solution was computed, then
NULL is returned.
Synopsis with Optional Arguments
#include <imsl.h>
float *imsl_f_superlu_smp (int n, int nz, Imsl_f_sparse_elem a[], float b[],
IMSL_EQUILIBRATE, int equilibrate,
IMSL_COLUMN_ORDERING_METHOD, Imsl_col_ordering method,
IMSL_COLPERM_VECTOR, int permc[],
IMSL_TRANSPOSE, int transpose,
IMSL_ITERATIVE_REFINEMENT, int refine,
IMSL_FACTOR_SOLVE, int factsol,
IMSL_DIAG_PIVOT_THRESH, float diag_pivot_thresh,
IMSL_SNODE_PREDICTION, int snode_prediction,
IMSL_PERFORMANCE_TUNING, int sp_ienv[],
IMSL_CSC_FORMAT, int HB_col_ptr[], int HB_row_ind, float HB_values[],
IMSL_SUPPLY_SPARSE_LU_FACTOR, Imsl_f_super_lu_smp_factor *lu_factor_supplied,
IMSL_RETURN_SPARSE_LU_FACTOR, Imsl_f_super_lu_smp_factor *lu_factor_returned,
IMSL_CONDITION, float *condition,
IMSL_PIVOT_GROWTH_FACTOR, float *recip_pivot_growth,
IMSL_FORWARD_ERROR_BOUND, float *ferr,
IMSL_BACKWARD_ERROR, float *berr,
IMSL_RETURN_USER, float x[],
0)
Optional Arguments
IMSL_EQUILIBRATE, int equilibrate (Input)
Specifies if the input matrix A should be equilibrated before factorization.
equilibrate | Description |
---|
0 | Do not equilibrate A before factorization |
1 | Equilibrate A before factorization. |
Default: equilibrate = 0.
IMSL_COLUMN_ORDERING_METHOD, Imsl_col_ordering method (Input)
The column ordering method used to preserve sparsity prior to the factorization process. Select the ordering method by setting method to one of the following:
method | Description |
---|
IMSL_NATURAL | Natural ordering, i.e.the column ordering of the input matrix. |
IMSL_MMD_ATA | Minimum degree ordering on the structure of AT A. |
IMSL_MMD_AT_PLUS_A | Minimum degree ordering on the structure of AT + A. |
IMSL_COLAMD | Column approximate minimum degree ordering. |
IMSL_PERMC | Use ordering given in permutation vector permc, which is input by the user through the optional argument IMSL_COLPERM_VECTOR. Vector permc is a permutation of the numbers 0,1,…, n-1. |
Default: method = IMSL_COLAMD.
IMSL_COLPERM_VECTOR, int permc[] (Input)
Array of length n that defines the permutation matrix Pc before postordering. This argument is required if IMSL_COLUMN_ORDERING_METHOD with method = IMSL_PERMC is used. Otherwise, it is ignored.
IMSL_TRANSPOSE, int transpose (Input)
Indicates if the transposed problem ATx = b is to be solved. This option can be used in conjunction with either of the options that supply the factorization.
transpose | Description |
---|
0 | Solve Ax = b. |
1 | Solve ATx = b. |
Default: transpose = 0.
IMSL_ITERATIVE_REFINEMENT, int refine (Input)
Indicates if iterative refinement is desired.
refine | Description |
---|
0 | No iterative refinement. |
1 | Do iterative refinement. |
Default: refine = 1.
IMSL_FACTOR_SOLVE, int factsol (Input)
Indicates if the LU factorization, the solution of a linear system, or both are to be computed.
factsol | Description |
---|
0 | Compute the LU factorization of the input matrix A and solve the system Ax = b. |
1 | Only compute the LU factorization of the input matrix and return. The LU factorization is returned via the optional argument IMSL_RETURN_SPARSE_LU_FACTOR. Input argument b is ignored. |
2 | Only solve Ax = b given the LU factorization of A. The LU factorization of A must be supplied via the optional argument IMSL_SUPPLY_SPARSE_LU_FACTOR. Input argument a is ignored unless iterative refinement, computation of the condition number, or computation of the reciprocal pivot growth factor is required. |
Default: factsol = 0.
IMSL_DIAG_PIVOT_THRESH, float diag_pivot_thresh (Input)
Specifies the threshold used for a diagonal entry to be an acceptable pivot, 0.0 ≤ diag_pivot_thresh ≤ 1.0.
Default: diag_pivot_thresh = 1.0.
IMSL_SNODE_PREDICTION, int snode_prediction (Input)
Indicates which scheme is used to predict the number of nonzeros in the L supernodes.
snode_prediction | Description |
---|
0 | Use static scheme for the prediction of the number of nonzeros in the L supernodes. |
1 | Use dynamic scheme for the prediction of the number of nonzeros in the L supernodes. |
Default: snode_prediction = 0.
IMSL_PERFORMANCE_TUNING, int sp_ienv[] (Input)
Array of length 8 containing parameters that allow the user to tune the performance of the matrix factorization algorithm. The elements sp_ienv[i] must be positive for i = 0,…,4 and different from zero for i = 5,6,7.
i | Description of sp_ienv[i] |
---|
0 | The panel size. Default: sp_ienv[0] = 10. |
1 | The relaxation parameter to control supernode amalgamation. Default: sp_ienv[1] = 5. |
2 | The maximum allowable size for a supernode. Default: sp_ienv[2] = 100. |
3 | The minimum row dimension to be used for 2D blocking. Default: sp_ienv[3] = 200. |
4 | The minimum column dimension to be used for 2D blocking. Default: sp_ienv[4] = 40. |
5 | The size of the array nzval to store the values of the L supernodes. A negative number represents the fills growth factor, i.e. the product of its absolute magnitude and the number of nonzeros in the original matrix A will be used to allocate storage. A positive number represents the number of nonzeros for which storage will be allocated. This element of array sp_ienv is used only if a dynamic scheme for the prediction of the sizes of the L supernodes is used, i.e. if snode_prediction = 1. Default: sp_ienv[5] = -20. |
6 | The size of the arrays rowind and nzval to store the columns in U. A negative number represents the fills growth factor, i.e. the product of its absolute magnitude and the number of nonzeros in the original matrix A will be used to allocate storage. A positive number represents the number of nonzeros for which storage will be allocated. Default: sp_ienv[6] = -20. |
7 | The size of the array rowind to store the subscripts of the L supernodes. A negative number represents the fills growth factor, i.e. the product of its absolute magnitude and the number of nonzeros in the original matrix A will be used to allocate storage. A positive number represents the number of nonzeros for which storage will be allocated. Default: sp_ienv[7] = -10. |
IMSL_CSC_FORMAT,
int HB_col_ptr[],
int HB_row_ind[],
float HB_values[] (Input)
Accept the coefficient matrix in compressed sparse column (CSC) format, as described in the
Compressed Sparse Column (CSC) Format section of the “Introduction” chapter of this manual.
IMSL_SUPPLY_SPARSE_LU_FACTOR,
Imsl_f_super_lu_smp_factor *lu_factor_supplied (Input)
The address of a structure of type
Imsl_f_super_lu_smp_factor containing the
LU factors of the input matrix computed with the
IMSL_RETURN_SPARSE_LU_FACTOR option. See the
Description section for a definition of this structure. To free the memory allocated within this structure, use function
imsl_f_superlu_smp_factor_free. IMSL_RETURN_SPARSE_LU_FACTOR,
Imsl_f_super_lu_smp_factor *lu_factor_returned (Output)
The address of a structure of type
Imsl_f_super_lu_smp_factor containing the
LU factorization of the input matrix. See the
Description section for a definition of this structure. To free the memory allocated within this structure, use function
imsl_f_superlu_smp_factor_free. IMSL_CONDITION, float *condition (Output)
The estimate of the reciprocal condition number of matrix a after equilibration (if done).
IMSL_PIVOT_GROWTH_FACTOR, float *recip_pivot_growth (Output)
The reciprocal pivot growth factor
If recip_pivot_growth is much less than 1, the stability of the LU factorization could be poor.
IMSL_FORWARD_ERROR_BOUND, float *ferr (Output)
The estimated forward error bound for the solution vector x. This option requires argument IMSL_ITERATIVE_REFINEMENT set to 1.
IMSL_BACKWARD_ERROR, float *berr (Output)
The componentwise relative backward error of the solution vector x. This option requires argument IMSL_ITERATIVE_REFINEMENT set to 1.
IMSL_RETURN_USER, float x[] (Output)
A user-allocated array of length n containing the solution x of the linear system.
Description
The steps imsl_f_superlu_smp uses to solve linear systems are identical to the steps described in the documentation of the serial version imsl_f_superlu.
Function imsl_f_superlu_smp uses a supernodal storage scheme for the LU factorization of matrix A. In contrast to the sequential version, the consecutive columns and supernodes of the L and U factors might not be stored contiguously in memory. Thus, in addition to the pointers to the beginning of each column or supernode, also pointers to the end of each column or supernode are needed. The factorization is contained in structure Imsl_f_super_lu_smp_factor and its two sub-structures Imsl_f_hbp_format and Imsl_f_scp_format. Following is a short description of these structures:
Table 1 – Structure Imsl_f_hbp_format
Parameter | Data Type | Description |
nnz | int | The number of nonzeros in the matrix. |
nzval | float * | Array of nonzero values packed by column. |
rowind | int * | Array of row indices of the nonzeros. |
colbeg | int * | Array of size ncol+1; colbeg[j] stores the location in nzval[] and rowind[], which starts column j. Element colbeg[ncol] points to the first free location in arrays nzval[] and rowind[]. |
colend | int * | Array of size ncol; colend[j] stores the location in nzval[] and rowind[] which is one past the last element of column j. |
Table 2 – Structure Imsl_f_scp_format
Parameter | Data Type | Description |
nnz | int | The number of nonzeros in the supernodal matrix. |
nsuper | int | The number of supernodes minus one. |
nzval | float * | Array of nonzero values packed by column. |
nzval_colbeg | int * | Array of size ncol+1; nzval_colbeg[j] points to the beginning of column j in nzval[]. Entry nzval_colbeg[ncol] points to the first free location in nzval[]. |
nzval_colend | int * | Array of size ncol; nzval_colend[j] points to one past the last element of column j in nzval[]. |
rowind | int * | Array of compressed row indices of the rectangular supernodes. |
rowind_colbeg | int * | Array of size ncol+1; rowind_colbeg[j] points to the beginning of column j in rowind[]. Element rowind_colbeg[ncol] points to the first free location in rowind[]. |
rowind_colend | int * | Array of size ncol; rowind_colend[j] points to one past the last element of column j in rowind[]. |
col_to_sup | int * | Array of size ncol+1; col_to_sup[j] is the supernode number to which column j belongs. Only the first ncol entries in col_to_sup[] are defined. |
sup_to_colbeg | int * | Array of size ncol+1; sup_to_colbeg[s] points to the first column of the s-th supernode; only the first nsuper+1 locations of this array are used. |
sup_to_colend | int * | Array of size ncol; sup_to_colend[s] points to one past the last column of the s-th supernode. Only the first nsuper+1 locations of this array are used. |
Table 3 – Structure Imsl_f_super_lu_smp_factor
Parameter | Data Type | Description |
nrow | int | The number of rows of matrix A. |
ncol | int | The number of columns of matrix A. |
equilibration_method | int | The method used to equilibrate A: 0 – No equilibration. 1 – Row equilibration. 2 – Column equilibration. 3 – Both row and column equilibration. |
rowscale | float * | Array of size nrow containing the row scale factors for A. |
columnscale | float * | Array of size ncol containing the column scale factors for A. |
rowperm | int * | Row permutation array of size nrow describing the row permutation matrix Pr. |
colperm | int * | Column permutation array of size ncol describing the column permutation matrix Pc. |
U | Imsl_f_hbp_format * | The part of the U factor of A outside the supernodal blocks, stored in Harwell-Boeing format. |
L | Imsl_f_scp_format * | The L factor of A, stored in supernodal format as block lower triangular matrix. |
Structure Imsl_d_super_lu_smp_factor and its two sub-structures are defined similarly by replacing float with double, Imsl_f_hbp_format with Imsl_d_hbp_format, and Imsl_f_scp_format with Imsl_d_scp_format in their respective definitions.
In contrast to the sequential version, the numerical factorization phase of the LU decomposition is parallelized. Since a dynamic memory expansion as in the serial case is difficult to implement for the parallel code, the estimated sizes of array rowind for the L and of arrays rowind and nzval for the U factor (see structures Imsl_f_scp_format and Imsl_f_hbp_format above) must be predetermined by the user via elements 6 and 7 of the performance tuning array sp_ienv.
In order to ensure that the columns of each L supernode are stored contiguously in memory, a static or dynamic prediction scheme for the size of the L supernodes can be used. The static version, which function imsl_f_superlu_smp uses by default, exploits the observation that for any row permutation P in PA = LU, the nonzero structure of L is contained in that of the Householder matrix H from the Householder sparse QR factorization A = QR. Furthermore, it can be shown that each fundamental supernode in L is always contained in a fundamental supernode of H. Therefore, the storage requirement for the L supernodes and array nzval in the L factor respectively can be estimated and allocated prior to the factorization based on the size of the H supernodes. The algorithm used to compute the supernode partition and the size of the supernodes in H is almost linear in the number of nonzeros of matrix A.
In practice, the above static prediction scheme is quite tight for most problems. However, if the number of nonzeros in H greatly exceeds the number of nonzeros in L, the user can try a dynamic prediction scheme by setting optional argument IMSL_SNODE_PREDICTION to 1. This scheme still uses the supernode partition in H, but dynamically searches the supernodal graph of L to obtain a much tighter upper bound for the required storage. Use of the dynamic scheme requires the user to define the size of array nzval in the L factor via element 5 of the performance tuning array sp_ienv.
For a complete description of the parallel algorithm, see Demmel et al. (1999c).
Copyright (c) 2003, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from U.S. Dept. of Energy).
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
(3) Neither the name of Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Examples
Example 1
The LU factorization of the sparse 6×6 matrix
is computed.
Let y = (1,2,3,4,5,6)T, so that b1 := Ay = (10,7,45,33,-34,31)T and b2 := ATy = (-9,8,39,13,1,21)T.
The LU factorization of A is used to solve the sparse linear systems Ax = b1 and ATx = b2.
#include <imsl.h>
int main(){
Imsl_f_sparse_elem a[] = { 0, 0, 10.0,
1, 1, 10.0,
1, 2, -3.0,
1, 3, -1.0,
2, 2, 15.0,
3, 0, -2.0,
3, 3, 10.0,
3, 4, -1.0,
4, 0, -1.0,
4, 3, -5.0,
4, 4, 1.0,
4, 5, -3.0,
5, 0, -1.0,
5, 1, -2.0,
5, 5, 6.0};
float b1[] = {10.0, 7.0, 45.0, 33.0, -34.0, 31.0};
float b2[] = { -9.0, 8.0, 39.0, 13.0, 1.0, 21.0 };
int n = 6, nz = 15;
float *x = NULL;
x = imsl_f_superlu_smp (n, nz, a, b1, 0);
imsl_f_write_matrix ("solution to A*x = b1", 1, n, x, 0);
imsl_free (x);
x = imsl_f_superlu_smp (n, nz, a, b2, IMSL_TRANSPOSE, 1, 0);
imsl_f_write_matrix ("solution to A^T*x = b2", 1, n, x, 0);
imsl_free (x);
}
Output
solution to A*x = b1
1 2 3 4 5 6
1 2 3 4 5 6
solution to A^T*x = b2
1 2 3 4 5 6
1 2 3 4 5 6
Example 2
This example uses the matrix A = E(1000,10) to show how the LU factorization of A can be used to solve a linear system with the same coefficient matrix A but different right-hand side. Maximum absolute errors are printed. After the computations, the space allocated for the LU factorization is freed via function imsl_f_superlu_smp_factor_free.
#include <imsl.h>
#include <stdlib.h>
#include <stdio.h>
int main(){
Imsl_f_sparse_elem *a = NULL;
Imsl_f_super_lu_smp_factor lu_factor;
float *b = NULL, *x = NULL, *mod_five = NULL, *mod_ten = NULL;
float error_factor_solve, error_solve;
int n = 1000, c = 10;
int i, nz, index;
/* Get the coefficient matrix */
a = imsl_f_generate_test_coordinate (n, c, &nz, 0);
/* Set two different predetermined solutions */
mod_five = (float*) malloc (n*sizeof(*mod_five));
mod_ten = (float*) malloc (n*sizeof(*mod_ten));
for (i=0; i<n; i++) {
mod_five[i] = (float) (i % 5);
mod_ten[i] = (float) (i % 10);
}
/* Choose b so that x will approximate mod_five */
b = (float *) imsl_f_mat_mul_rect_coordinate ("A*x",
IMSL_A_MATRIX, n, n, nz, a,
IMSL_X_VECTOR, n, mod_five, 0);
/* Solve Ax = b */
x = imsl_f_superlu_smp (n, nz, a, b,
IMSL_RETURN_SPARSE_LU_FACTOR, &lu_factor, 0);
/* Compute max absolute error */
error_factor_solve = imsl_f_vector_norm (n, x,
IMSL_SECOND_VECTOR, mod_five,
IMSL_INF_NORM, &index,
0);
free (mod_five);
imsl_free (b);
imsl_free (x);
/* Get new right hand side -- b = A * mod_ten */
b = (float *) imsl_f_mat_mul_rect_coordinate ("A*x",
IMSL_A_MATRIX, n, n, nz, a,
IMSL_X_VECTOR, n, mod_ten,
0);
/* Use the previously computed factorization
to solve Ax = b */
x = imsl_f_superlu_smp (n, nz, a, b,
IMSL_SUPPLY_SPARSE_LU_FACTOR, &lu_factor,
IMSL_FACTOR_SOLVE, 2,
0);
error_solve = imsl_f_vector_norm (n, x,
IMSL_SECOND_VECTOR, mod_ten,
IMSL_INF_NORM, &index,
0);
free (mod_ten);
imsl_free (b);
imsl_free (x);
imsl_free (a);
/* Free sparse LU structure */
imsl_f_superlu_smp_factor_free (&lu_factor);
/* Print errors */
printf ("absolute error (factor/solve) = %e\n",
error_factor_solve);
printf ("absolute error (solve) = %e\n", error_solve);
}
Output
absolute error (factor/solve) = 1.096725e-005
absolute error (solve) = 5.435944e-005
Warning Errors
IMSL_ILL_CONDITIONED | The input matrix is too ill-conditioned. An estimate of the reciprocal of its L1 condition number is “rcond” = #. The solution might not be accurate. |
Fatal Errors
IMSL_SINGULAR_MATRIX | The input matrix is singular. |