A set of parallel benchmark programs is shown in Table D. These main programs call Fortran 90 box data type functions, in single and double precision. They compare our parallel allocation algorithm to a scalar sequential method. The main program reads single lines of input:
NSIZE NTIMES NRACKS PREC ROOT_WORKS “Description”
Two initial lines of output echo the “Description” field, whether or not the root is working, and the number of processors in the MPI communicator. The parameters NSIZE, NTRIES and NRACKS appear in the summary tables. The parameter PREC has values 1, 2 or 3. The choice depends on whether the user wants precision of single, double or both versions timed. The array functions return a 7´ 2 summary table of values. The (1:6, 1) and (1:6,2) elements of this array represent the results and parameters of the benchmark for the parallel and non-parallel versions. The (7,1) and (7,2) elements of this array represent the ratio of the parallel to the scalar times and a first-order approximation to the variation in the ratio.
As an example, the program time_parallel_i is compiled and linked with the single and double precision timing functions s_parallel_i_bench and d_parallel_i_bench.
This routine evaluates the time to compute 4 inverse matrices of size 600 by 600 using the defined operator .i. The “Average” is the mean of the individual elapsed times for 5 calls to the routines, obtaining 4 inverses in each call. The “St. Dev.” is the standard deviation for that “Average”. This value indicates the variability of the “Average”. In order for this value to provide any useful information it is necessary for |NTRIES| > 1. The value |NTRIES| = 1 is acceptable, but only one time sample and no standard deviation is obtained. Values of NTRIES > 0 result in the printing of results as shown in Table C. The numbers in the table will vary depending on the machine and other factors that impact performance of Fortran codes. If NTRIES < 0 the 7 ´ 2 functions return the tabular values shown, with |NTRIES| samples. No printing is performed with NTRIES < 0.
Single precision benchmark of parallel .i. and non-parallel .i.: | |||
Double precision benchmark of parallel .i. and non-parallel .i.: | |||
Table C: Performance Summary: Box operator .i.
Below is a list of the performance evaluation programs that time the box data computations using parallel and non-parallel resources.
Table D: Parallel and non-Parallel Box Comparisons
Table D continued: Parallel and non-Parallel Box Comparisons
Visual Numerics, Inc. PHONE: 713.784.3131 FAX:713.781.9260 |