Using ScaLAPACK Enhanced Routines

FNLMath : Introduction : Using ScaLAPACK Enhanced Routines

General Remarks

Use of the ScaLAPACK enhanced routines allows a user to solve large linear systems of algebraic equations at a performance level that might not be achievable on one computer by performing the work in parallel across multiple computers. One might also use these routines on linear systems that prove to be too large for the address space of the target computer. IMSL has tried to facilitate the use of parallel computing in these situations by providing interfaces to ScaLAPACK routines which accomplish the task. The IMSL Library solver interface has the same look and feel whether one is using the routine on a single computer or across multiple computers.

The basic steps required to utilize the IMSL routines which interface with ScaLAPACK routines are:

1. Initialize MPI

2. Initialize the processor grid

3. Define any necessary array descriptors

4. Allocate space for the local arrays

5. Set up local matrices across the processor grid

6. Call the IMSL routine which interfaces with ScaLAPACK

7. Gather the results from across the processor grid

8. Release the processor grid

9. Exit MPI

Utilities are provided in the IMSL Library that facilitate these steps for the user. Each of these utilities is documented in Chapter 11, “Utilities”. We visit the steps briefly here:

1. Initialize MPI

The user should call MP_SETUP() in this step. This function is described in detail in“Getting Started with Modules MPI_setup_int and MPI_node_int” in Chapter 10, “Linear Algebra Operators and Generic Functions”. For ScaLAPACK usage, suffice it to say that following a call to the function MP_SETUP(), the module MPI_node_int will contain information about the number of processors, the rank of a processor, and the communicator for the application. A call to this function will return the number of processors available to the program. Since the module MPI_node_int is used by MPI_setup_int, it is not necessary to explicitly use the module MPI_node_int. If MP_SETUP() is not called, the program computes entirely on one node. No routine from MPI is called.

2. Initialize the processor grid

SCALAPACK_SETUP (see Chapter 11, “Utilities”) is called at this step. This call will set up the processor grid for the user, define the context ID variable, MP_ICTXT, for the processor grid, and place MP_ICTXT into the module GRIDINFO_INT. Use of SCALAPACK_SUPPORT will make the information in MPI_NODE_INT and GRIDINFO_INT available to the user’s program.

3. Define any necessary array descriptors

Consider the generic matrix A which is to be carved up and distributed across the processors in the processor grid. In ScaLAPACK parlance, we refer to A as being the “global” array A which is to be distributed across the processor grid in 2D block cyclic fashion (see Chapter 11, “Utilities”). Each processor in the grid will then have access to a subset of the global array A. We refer to the subset array to which the individual processor has access as the “local” array A0. Just as it is sometimes necessary for a program to be aware of the leading dimension of the global array A, it is also necessary for the program to be aware of other critical information about the local array A0. This information can be obtained by calling the IMSL utility SCALAPACK_GETDIM. The ScaLAPACK Library utility DESCINIT is then used to store this information in a vector. (For more information, see the Usage Notes section of Chapter 11, “Utilities”.)

4. Allocate space for the local arrays

The array dimensions, obtained in the previous step, are used at this point to allocate space for any local arrays that will be used in the call to the IMSL routine.

5. Set up local matrices across the processor grid

If the matrices to be used by the solvers have not been distributed across the processor grid, IMSL provides utility routines SCALAPACK_READ and SCALAPACK_MAP to help in the distribution of global arrays across processors. SCALAPACK_READ will read data from a file while SCALAPACK_MAP will map a global array to the processor grid. Users may choose to distribute the arrays themselves as long as they distribute the arrays in 2D block cyclic fashion consistent with the array descriptors that have been defined.

6. Call the IMSL routine which interfaces with ScaLAPACK

The IMSL routines which interface with ScaLAPACK are listed in Table 5.

7. Gather the results from across the processor grid

IMSL provides utility routines SCALAPACK_WRITE and SCALAPACK_UNMAP to help in the gathering of results from across processors to a global array or file. SCALAPACK_WRITE will write data to a file while SCALAPACK_UNMAP will map local arrays from the processor grid to a global array.

8. Release the processor grid

This is accomplished by a call to SCALAPACK_EXIT.

9. Exit MPI

A call to MP_SETUP with the argument ‘FINAL’ will shut down MPI and set the value of MP_NPROCS = 0. This flags that MPI has been initialized and terminated. It cannot be initialized again in the same program unit execution. No MPI routine is defined when MP_NPROCS has this value.