A(1:M,:)— (Input/Output) Columns of the matrix with limits given by entries in the array IPART(1:2,1:max(1,MP_NPROCS)). On output Ak is replaced by the product QAk, where Q is an orthogonal matrix. The value SIZE(A,1) defines the value of M. Each processor starts and exits with its piece of the partitioned matrix.
B(1:M) — (Input/Output) Assumed-size array of length M containing the right-hand side vector, b. On output b is replaced by the product Qb, where Q is the orthogonal matrix applied to A. All processors in the communicator start and exit with the same vector.
X(1:N) — (Output) Assumed-size array of length N containing the solution, x≥0. The value SIZE(X) defines the value of N. All processors exit with the same vector.
RNORM — (Output) Scalar that contains the Euclidean or least-squares length of the residual vector, . All processors exit with the same value.
W(1:N) — (Output) Assumed-size array of length N containing the dual vector, . All processors exit with the same vector.
INDEX(1:N) — (Output) Assumed-size array of length N containing the NSETP indices of columns in the positive solution, and the remainder that are at their constraint. The number of positive components in the solution x is given by the Fortran intrinsic function value, NSETP=COUNT(X > 0). All processors exit with the same array.
IPART(1:2,1:max(1,MP_NPROCS)) — (Input) Assumed-size array containing the partitioning describing the matrix A. The value MP_NPROCS is the number of processors in the communicator, except when MPI has been finalized with a call to the routine MP_SETUP(‘Final’). This causes MP_NPROCS to be assigned 0. Normally users will give the partitioning to processor of rank = MP_RANK by setting IPART(1,MP_RANK+1)= first column index, and IPART(2,MP_RANK+1)= last column index. The number of columns per node is typically based on their relative computing power. To avoid a node with rank MP_RANK doing any work except communication, set IPART(1,MP_RANK+1) = 0 and IPART(2,MP_RANK+1)= -1. In this exceptional case there is no reference to the array A(:,:) at that node.
Optional Argument
IOPT(:)— (Input) Assumed-size array of derived type S_OPTIONS or D_OPTIONS. This argument is used to change internal parameters of the algorithm. Normally users will not be concerned about this argument, so they would not include it in the argument list for the routine.
Packaged Options forPARALLEL_NONNEGATIVE_LSQ
Option Name
Option Value
PNLSQ_SET_TOLERANCE
1
PNLSQ_SET_MAX_ITERATIONS
2
PNLSQ_SET_MIN_RESIDUAL
3
IOPT(IO)=?_OPTIONS(PNLSQ_SET_TOLERANCE, TOLERANCE) Replaces the default rank tolerance for using a column, from EPSILON(TOLERANCE) to TOLERANCE. Increasing the value of TOLERANCE will cause fewer columns to be moved from their constraints, and may cause the minimum residual RNORM to increase.
IOPT(IO)=?_OPTIONS(PNLSQ_SET_MIN_RESIDUAL, RESID) Replaces the default target for the minimum residual vector length from 0 to RESID. Increasing the value of RESID can result in fewer iterations and thus increased efficiency. The descent in the optimization will stop at the first point where the minimum residual RNORM is smaller than RESID. Using this option may result in the dual vector not satisfying its optimality conditions, as noted above.
IOPT(IO)= PNLSQ_SET_MAX_ITERATIONS
IOPT(IO+1)= NEW_MAX_ITERATIONS Replaces the default maximum number of iterations from 3*N to NEW_MAX_ITERATIONS. Note that this option requires two entries in the derived type array.
Specific: The specific interface names are S_PARALLEL_NONNEGATIVE_LSQ and D_PARALLEL_NONNEGATIVE_LSQ.
Description
Subroutine PARALLEL_NONNEGATIVE_LSQ solves the linear least-squares system , using the algorithm NNLS found in Lawson and Hanson, (1995), pages 160-161. The code now updates the dual vector of Step 2, page 161. The remaining new steps involve exchange of required data, using MPI.
Examples
Example 1: Distributed Linear Inequality Constraint Solver
The program PNLSQ_EX1 illustrates the computation of the minimum Euclidean length solution of an system of linear inequality constraints , . The solution algorithm is based on Algorithm LDP, page 165-166, loc. cit. The rows of are partitioned and assigned random values. When the minimum Euclidean length solution to the inequalities has been calculated, the residuals are computed, with the dual variables to the NNLS problem indicating the entries of r that are precisely zero.
The fact that matrix products involving both E and ET are needed to compute the constrained solution y and the residuals r, implies that message passing is required. This occurs after the NNLS solution is computed.
PROGRAM PNLSQ_EX1
! Use Parallel_nonnegative_LSQ to solve an inequality
! constraint problem, Gy >= h. This algorithm uses
! Algorithm LDP of Solving Least Squares Problems,
! page 165. The constraints are allocated to the
! processors, by rows, in columns of the array A(:,:).
! Spread constraint rows evenly to the processors.
IPART(1,1)=1
DO L=2,MP_NPROCS
IPART(2,L-1)=IPART(1,L-1)+DN
IPART(1,L)=IPART(2,L-1)+1
END DO
IPART(2,MP_NPROCS)=N
! Define the constraint data using random values.
K=max(0,IPART(2,MP_RANK+1)-IPART(1,MP_RANK+1)+1)
ALLOCATE(A(M,K), ASAVE(M,K), X(N), W(N), &
B(M), Y(M), INDEX(N))
! The use of ASAVE can be removed by regenerating
! the data for A(:,:) after the return from
! Parallel_nonnegative_LSQ.
A=rand(A); ASAVE=A
IF(MP_RANK == 0 .and. PRINT) &
CALL SHOW(IPART, &
"Partition of the constraints to be solved")
! Set the right-hand side to be one in the last component, zero elsewhere.
B=ZERO;B(M)=ONE
! Solve the dual problem.
CALL Parallel_nonnegative_LSQ &
(A, B, X, RNORM, W, INDEX, IPART)
! Each processor multiplies its block times the part of
! the dual corresponding to that part of the partition.
Y=ZERO
DO J=IPART(1,MP_RANK+1),IPART(2,MP_RANK+1)
JSHIFT=J-IPART(1,MP_RANK+1)+1
Y=Y+ASAVE(:,JSHIFT)*X(J)
END DO
! Accumulate the pieces from all the processors. Put sum into B(:)
! on rank 0 processor.
B=Y
IF(MP_NPROCS > 1) &
CALL MPI_REDUCE(Y, B, M, MPI_DOUBLE_PRECISION,&
MPI_SUM, 0, MP_LIBRARY_WORLD, IERROR)
IF(MP_RANK == 0) THEN
! Compute constrained solution at the root.
! The constraints will have no solution if B(M) = ONE.
! All of these example problems have solutions.
B(M)=B(M)-ONE;B=-B/B(M)
END IF
! Send the inequality constraint solution to all nodes.
IF(MP_NPROCS > 1) &
CALL MPI_BCAST(B, M, MPI_DOUBLE_PRECISION, &
0, MP_LIBRARY_WORLD, IERROR)
! For large problems this printing needs to be removed.
IF(MP_RANK == 0 .and. PRINT) &
CALL SHOW(B(1:NP), &
"Minimal length solution of the constraints")
! Compute residuals of the individual constraints.
! If only the solution is desired, the program ends here.
X=ZERO
DO J=IPART(1,MP_RANK+1),IPART(2,MP_RANK+1)
JSHIFT=J-IPART(1,MP_RANK+1)+1
X(J)=dot_product(B,ASAVE(:,JSHIFT))
END DO
! This cleans up residuals that are about rounding
! error unit (times) the size of the constraint
! equation and right-hand side. They are replaced
! by exact zero.
WHERE(W == ZERO) X=ZERO; W=X
! Each group of residuals is disjoint, per processor.
! We add all the pieces together for the total set of
! constraints.
IF(MP_NPROCS > 1) &
CALL MPI_REDUCE(X, W, N, MPI_DOUBLE_PRECISION,&
MPI_SUM, 0, MP_LIBRARY_WORLD, IERROR)
IF(MP_RANK == 0 .and. PRINT) &
CALL SHOW(W, "Residuals for the constraints")
! See to any errors and shut down MPI.
MP_NPROCS=MP_SETUP('Final')
IF(MP_RANK == 0) THEN
IF(COUNT(W < ZERO) == 0) WRITE(*,*)&
" Example 1 for PARALLEL_NONNEGATIVE_LSQ is correct."
END IF
END
Output
Example 1 for PARALLEL_NONNEGATIVE_LSQ is correct.
Example 2: Distributed Non-negative Least-Squares
The program PNLSQ_EX2 illustrates the computation of the solution to a system of linear least-squares equations with simple constraints: subject to . In this example we write the row vectors on a file. This illustrates reading the data by rows and arranging the data by columns, as required by PARALLEL_NONNEGATIVE_LSQ. After reading the data, the right-hand side vector is broadcast to the group before computing a solution,. The block-size is chosen so that each participating processor receives the same number of columns, except any remaining columns sent to the processor with largest rank. This processor contains the right-hand side before the broadcast.
This example illustrates connecting a BLACS ‘context’ handle and the Fortran Library MPI communicator, MP_LIBRARY_WORLD, described in Chapter 10.
PROGRAM PNLSQ_EX2
! Use Parallel_Nonnegative_LSQ to solve a least-squares
! problem, A x = b, with x >= 0. This algorithm uses a
! distributed version of NNLS, found in the book
! Solving Least Squares Problems, page 165. The data is
! read from a file, by rows, and sent to the processors,