Parallel Program Descriptions

Appendix D: Benchmarking or Timing Programs

Parallel Program Descriptions

A set of parallel benchmark programs is shown in Table D. These main programs call Fortran 90 box data type functions, in single and double precision. They compare our parallel allocation algorithm to a scalar sequential method. The main program reads single lines of input:

NSIZE NTIMES NRACKS PREC ROOT_WORKS “Description ”

QUIT to Stop

Two initial lines of output echo the “Description” field, whether or not the root is working, and the number of processors in the MPI communicator. The parameters NSIZE, NTRIES and NRACKS appear in the summary tables. The parameter PREC has values 1, 2 or 3. The choice depends on whether the user wants precision of single, double or both versions timed. The array functions return a 7 ´ 2 summary table of values. The (1:6, 1) and (1:6,2) elements of this array represent the results and parameters of the benchmark for the parallel and non-parallel versions. The (7,1) and (7,2) elements of this array represent the ratio of the parallel to the scalar times and a first-order approximation to the variation in the ratio.

Parallel Box Version	Scalar Box Equivalent
1. Average time	Average time
2. Standard deviation	Standard deviation
3. Total Seconds	Total Seconds
4. nsize	nsize
5. nracks	nracks
6. ntries	ntries
7. Parallel/Scalar Ratio	Variation in Ratio

As an example, the program time_parallel_i is compiled and linked with the single and double precision timing functions s_parallel_i_bench and d_parallel_i_bench.

This routine evaluates the time to compute 4 inverse matrices of size 600 by 600 using the defined operator .i. The “Average” is the mean of the individual elapsed times for 5 calls to the routines, obtaining 4 inverses in each call. The “St. Dev.” is the standard deviation for that “Average”. This value indicates the variability of the “Average”. In order for this value to provide any useful information it is necessary for |NTRIES| > 1. The value |NTRIES| = 1 is acceptable, but only one time sample and no standard deviation is obtained. Values of NTRIES > 0 result in the printing of results as shown in Table C . The numbers in the table will vary depending on the machine and other factors that impact performance of Fortran codes. If NTRIES < 0 the 7 ´ 2 functions return the tabular values shown, with |NTRIES| samples. No printing is performed with NTRIES < 0.

Single precision benchmark of parallel .i. and non-parallel .i.: Date of benchmark, (Y, Mo, D, H, M, S): 2006 5 11 8 58 58
1	1.5815E+00	4.0241E+00	Average
2	2.5031E-01	1.8035E-02	St. Dev.
3	7.9077E+00	2.0121E+01	Total Seconds
4	5.0000E+01	5.0000E+01	Size
5	5.0000E+00	5.0000E+00	Racks per box
6	5.0000E+00	5.0000E+00	Repeats
Non-parallel/parallel averages and variation:
	2.5444E+00	3.9129E-01

Double precision benchmark of parallel .i. and non-parallel .i.: Date of benchmark, (Y, Mo, D, H, M, S): 2006 5 11 8 58 59
1	1.6985D+00	4.0372D+00	Average
2	9.8576D-01	2.3836D-02	St. Dev.
3	8.4923D+00	2.0186D+01	Total Seconds
4	5.0000D+01	5.0000D+01	Size
5	5.0000D+00	5.0000D+00	Racks per box
6	5.0000D+00	5.0000D+00	Repeats
Non-parallel/parallel averages and variation:
	2.3770D+00	1.2392D-01

Table C : Performance Summary: Box operator .i.

Below is a list of the performance evaluation programs that time the box data computations using parallel and non-parallel resources.

Number	Program Units	Function Timed
1	time_parallel_i.f90, s_parallel_i_bench.f90, d_parallel_i_bench.f90	.i. A
2	time_parallel_ix.f90, s_parallel_ix_bench.f90, d_parallel_ix_bench.f90	A .ix. B
3	time_parallel_xi.f90, s_parallel_xi_bench.f90, d_parallel_xi_bench.f90	B .xi. A
4	time_parallel_x.f90, s_parallel_x_bench.f90, d_parallel_x_bench.f90	A .x. B
5	time_parallel_tx.f90, s_parallel_tx_bench.f90, d_parallel_tx_bench.f90	A .tx. B
6	time_parallel_xt.f90, s_parallel_xt_bench.f90, d_parallel_xt_bench.f90	A .xt. B
7	time_parallel_hx.f90, s_parallel_hx_bench.f90, d_parallel_hx_bench.f90	A .hx. B
8	time_parallel_xh.f90, s_parallel_xh_bench.f90, d_parallel_xh_bench.f90	A .xh. B
9	time_parallel_chol.f90, s_parallel_chol_bench.f90, d_parallel_chol_bench.f90	CHOL(A)
10	time_parallel_cond.f90, s_parallel_cond_bench.f90, d_parallel_cond_bench.f90	COND(A)
11	time_parallel_rank.f90, s_parallel_rank_bench.f90, d_parallel_rank_bench.f90	RANK(A)

Table D : Parallel and non-Parallel Box Comparisons

Number	Program Units	Function Timed
12	time_parallel_det.f90, s_parallel_det_bench.f90, d_parallel_det_bench.f90	DET(A)
13	time_parallel_orth.f90, s_parallel_orth_bench.f90, d_parallel_orht_bench.f90	ORTH(A,R=R)
14	time_parallel_svd.f90, s_parallel_svd_bench.f90, d_parallel_svd_bench.f90	SVD(A,U=U,V=V)
15	time_parallel_norm.f90, s_parallel_norm_bench.f90, d_parallel_norm_bench.f90	NORM(A,TYPE=I)
16	time_parallel_eig.f90, s_parallel_eig_bench.f90, d_parallel_eig_bench.f90	EIG(A,W=W)
17	time_parallel_fft.f90, s_parallel_fft_bench.f90, d_parallel_fft_bench.f90	FFT_BOX(A) IFFT_BOX(A)

Table D continued: Parallel and non-Parallel Box Comparisons

Visual Numerics, Inc.
Visual Numerics - Developers of IMSL and PV-WAVE
http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260