FNLMath : Utilities : MP_SETUP

Initializes or finalizes MPI.
Function Return Value
Number of nodes, MP_NPROCS, in the communicator, MP_LIBRARY_WORLD. (Output)
Returned when MP_SETUP is called with no arguments:
Required Argument
Optional Arguments
NOTE — Character string ‘Final’. (Input)
With ‘Final’ all pending error messages are sent from the nodes to the root and printed. If any node should STOP after printing messages, then MPI_Finalize() and a STOP are executed. Otherwise, only MPI_Finalize() is called. The character string ‘Final’ is the only valid string for this argument.
N — Size of array to be allocated for timing. (Input)
When this argument is supplied, the array MPI_NODE_PRIORITY is allocated with MP_PROCS components. The matrix products A .x. B are timed individually at each node of the machine. The elapsed time is noted and sorted to determine the node priority order. A and B are allocated to size N by N, and initialized with random data. The priority order is finally broadcast to the other nodes.
FORTRAN 90 Interface
MP_SETUP ( [,])
Following a call to the function MP_SETUP(), the module MPI_node_int will contain information about the number of processors, the rank of a processor, the communicator for IMSL Fortran Numerical Library, and the usage priority order of the node machines:
When the function MP_SETUP() is called with no arguments, the following events occur:
*If MPI has not been initialized, it is first initialized. This step uses the routines MPI_Initialized() and possibly MPI_Init(). Users who choose not to call MP_SETUP() must make the required initialization call before using any IMSL Fortran Numerical Library code that relies on MPI for its execution. If the user’s code calls an IMSL Fortran Numerical Library function utilizing the box data type and MPI has not been initialized, then the computations are performed on the root node. The only MPI routine always called in this context is MPI_Initialized(). The name MP_SETUP is pushed onto the subprogram or call stack.
*If MP_LIBRARY_WORLD equals its initial value (=huge(1)) then MPI_COMM_WORLD, the default MPI communicator, is duplicated and becomes its handle. This uses the routine MPI_Comm_dup(). Users can change the handle of MP_LIBRARY_WORLD as required by their application code. Often this issue can be ignored.
*The integers MP_RANK and MP_NPROCS are respectively the node’s rank and the number of nodes in the communicator, MP_LIBRARY_WORLD. Their values require the routines MPI_Comm_size() and MPI_Comm_rank(). The default values are important when MPI is not initialized and a box data type is computed. In this case the root node is the only node and it will do all the work. No calls to MPI communication routines are made when MP_NPROCS = 1 when computing the box data type functions. A program can temporarily assign this value to force box data type computation entirely at the root node. This is desirable for problems where using many nodes would be less efficient than using the root node exclusively.
*The array MPI_NODE_PRIORITY(:) is not allocated unless the user allocates it. The IMSL Fortran Numerical Library codes use this array for assigning tasks to processors, if it is allocated. If it is not allocated, the default priority of the nodes is
(0,1,...,MP_NPROCS-1). Use of the function call MP_SETUP(N) allocates the array, as explained below. Once the array is allocated its size is MP_NPROCS. The contents of the array is a permutation of the integers 0,...,MP_NPROCS-1. Nodes appearing at the start of the list are used first for parallel computing. A node other than the root can avoid any computing, except receiving the schedule, by setting the value MPI_NODE_PRIORITY(I) < 0. This means that node |MPI_NODE_PRIORITY(I)| will be sent the task schedule but will not perform any significant work as part of box data type function evaluations.
*The LOGICAL flag MPI_ROOT_WORKS designates whether or not the root node participates in the major computation of the tasks. The root node communicates with the other nodes to complete the tasks but can be designated to do no other work. Since there may be only one processor, this flag has the default value .TRUE., assuring that one node exists to do work. When more than one processor is available users can consider assigning MPI_ROOT_WORKS=.FALSE.. This is desirable when the alternate nodes have equal or greater computational resources compared with the root node. Parallel Example 4 illustrates this usage. A single problem is given a box data type, with one rack. The computing is done at the node, other than the root, with highest priority. This example requires more than one processor since the root does no work.
When the generic function MP_SETUP(N) is called, where N is a positive integer, a call to MP_SETUP() is first made, using no argument. Use just one of these calls to MP_SETUP(). This initializes the MPI system and the other parameters described above. The array MPI_NODE_PRIORITY(:) is allocated with size MP_NPROCS. Then DOUBLE PRECISION matrix products C = AB, where A and B are N by N matrices, are computed at each node and the elapsed time is recorded. These elapsed times are sorted and the contents of MPI_NODE_PRIORITY(:) are permuted in accordance with the shortest times yielding the highest priority. All the nodes in the communicator MP_LIBRARY_WORLD are timed. The array MPI_NODE_PRIORITY(:) is then broadcast from the root to the remaining nodes of MP_LIBRARY_WORLD using the routine MPI_Bcast(). Timing matrix products to define the node priority is relevant because the effort to compute C is comparable to that of many linear algebra computations of similar size. Users are free to define their own node priority and broadcast the array MPI_NODE_PRIORITY(:) to the alternate nodes in the communicator.
To print any IMSL Fortran Numerical Library error messages that have occurred at any node, and to finalize MPI, use the function call MP_SETUP(‘Final’). The case of the string ‘Final’ is not important. Any error messages pending will be discarded after printing on the root node. This is triggered by popping the name ‘MP_SETUP’ from the subprogram stack or returning to Level 1 in the stack. Users can obtain error messages by popping the stack to Level 1 and still continuing with MPI calls. This requires executing call e1pop (‘MP_SETUP’). To continue on after summarizing errors execute call e1psh (‘MP_SETUP’). More details about the error processor are found in Reference Material chapter of this manual.
Messages are printed by nodes from largest rank to smallest, which is the root node. Use of the routine MPI_Finalize() is made within MP_SETUP(‘Final’), which shuts down MPI. After MPI_Finalize() is called, the value of MP_NPROCS = 0. This flags that MPI has been initialized and terminated. It cannot be initialized again in the same program unit execution. No MPI routine is defined when MP_NPROCS has this value.
Parallel Example (parallel_ex01.f90)
use linear_operators
use mpi_setup_int
implicit none
! This is the equivalent of Parallel Example 1 for .ix., with box data types
! and functions.
integer, parameter :: n=32, nr=4
real(kind(1e0)) :: one=1e0
real(kind(1e0)), dimension(n,n,nr) :: A, b, x, err(nr)
! Setup for MPI.
! Generate random matrices for A and b:
A = rand(A); b=rand(b)
! Compute the box solution matrix of Ax = b.
x = A .ix. b
! Check the results.
err = norm(b - (A .x. x))/(norm(A)*norm(x)+norm(b))
if (ALL(err <= sqrt(epsilon(one))) .and. MP_RANK == 0) &
write (*,*) 'Parallel Example 1 is correct.'
! See to any error messages and quit MPI.
Parallel Example (parallel_ex04.f90)
Here an alternate node is used to compute the majority of a single application, and the user does not need to make any explicit calls to MPI routines. The time-consuming parts are the evaluation of the eigenvalue-eigenvector expansion, the solving step, and the residuals. To do this, the rank‑2 arrays are changed to a box data type with a unit third dimension. This uses parallel computing. The node priority order is established by the initial function call, MP_SETUP(n). The root is restricted from working on the box data type by assigning MPI_ROOT_WORKS=.false.. This example anticipates that the most efficient node, other than the root, will perform the heavy computing. Two nodes are required to execute.
use linear_operators
use mpi_setup_int
implicit none
! This is the equivalent of Parallel Example 4 for matrix exponential.
! The box dimension has a single rack.
integer, parameter :: n=32, k=128, nr=1
integer i
real(kind(1e0)), parameter :: one=1e0, t_max=one, delta_t=t_max/(k-1)
real(kind(1e0)) err(nr), sizes(nr), A(n,n,nr)
real(kind(1e0)) t(k), y(n,k,nr), y_prime(n,k,nr)
complex(kind(1e0)), dimension(n,nr) :: x(n,n,nr), z_0, &
Z_1(n,nr,nr), y_0, d
! Setup for MPI. Establish a node priority order.
! Restrict the root from significant computing.
! Illustrates using the 'best' performing node that
! is not the root for a single task.
! Generate a random coefficient matrix.
A = rand(A)
! Compute the eigenvalue-eigenvector decomposition
! of the system coefficient matrix on an alternate node.
D = EIG(A, W=X)
! Generate a random initial value for the ODE system.
y_0 = rand(y_0)
! Solve complex data system that transforms the initial
! values, X z_0=y_0.
z_1= X .ix. y_0 ; z_0(:,nr) = z_1(:,nr,nr)
! The grid of points where a solution is computed:
t = (/(i*delta_t,i=0,k-1)/)
! Compute y and y' at the values t(1:k).
! With the eigenvalue-eigenvector decomposition AX = XD, this
! is an evaluation of EXP(A t)y_0 = y(t).
y = X .x.exp(spread(d(:,nr),2,k)*spread(t,1,n))*spread(z_0(:,nr),2,k)
! This is y', derived by differentiating y(t).
y_prime = X .x. &
spread(d(:,nr),2,k)*exp(spread(d(:,nr),2,k)*spread(t,1,n))* &
! Check results. Is y' - Ay = 0?
err = norm(y_prime-(A .x. y))
if (ALL(err <= sqrt(epsilon(one))*sizes) .and. MP_RANK == 0) &
write (*,*) 'Parallel Example 4 is correct.'
! See to any error messages and quit MPI.