FNLMath : Linear Systems : PARALLEL_BOUNDED_LSQ
PARALLEL_BOUNDED_LSQ more...
 NOTE: For a detailed description of MPI Requirements see Dense Matrix Parallelism Using MPI in Chapter 10 of this manual.
Solves a linear least-squares system with bounds on the unknowns.
Usage Notes
CALL PARALLEL_BOUNDED_LSQ (A, B, BND, X, RNORM, W, INDEX, IPART, NSETP, NSETZ, IOPT=IOPT)
Required Arguments
A(1:M,:)— (Input/Output) Columns of the matrix with limits given by entries in the array IPART(1:2,1:max(1,MP_NPROCS)). On output is replaced by the product , where is an orthogonal matrix. The value SIZE(A,1) defines the value of M. Each processor starts and exits with its piece of the partitioned matrix.
B(1:M) — (Input/Output) Assumed-size array of length M containing the right-hand side vector, . On output is replaced by the product , where is the orthogonal matrix applied to and is a set of active bounds for the solution. All processors in the communicator start and exit with the same vector.
BND(1:2,1:N) — (Input) Assumed-size array containing the bounds for . The lower bound is in BND(1,J), and the upper bound is in BND(2,J).
X(1:N) — (Output) Assumed-size array of length N containing the solution, . The value SIZE(X) defines the value of N. All processors exit with the same vector.
RNORM — (Output) Scalar that contains the Euclidean or least-squares length of the residual vector, . All processors exit with the same value.
W(1:N) — (Output) Assumed-size array of length N containing the dual vector, . At a solution exactly one of the following is true for each
αj = xj = βj, and wj arbitrary
αj = xj, and w≤ 0
xj = βj, and wj ≥ 0
αj < xj < βj, and wj = 0
All processors exit with the same vector.
INDEX(1:N) — (Output) Assumed-size array of length N containing the NSETP indices of columns in the solution interior to bounds, and the remainder that are at a constraint. All processors exit with the same array.
IPART(1:2,1:max(1,MP_NPROCS)) — (Input) Assumed-size array containing the partitioning describing the matrix A. The value MP_NPROCS is the number of processors in the communicator, except when MPI has been finalized with a call to the routine MP_SETUP(‘Final’). This causes MP_NPROCS to be assigned 0. Normally users will give the partitioning to processor of rank = MP_RANK by setting IPART(1,MP_RANK+1)= first column index, and IPART(2,MP_RANK+1)= last column index. The number of columns per node is typically based on their relative computing power. To avoid a node with rank MP_RANK doing any work except communication, set IPART(1,MP_RANK+1) = 0 and IPART(2,MP_RANK+1)= -1. In this exceptional case there is no reference to the array A(:,:) at that node.
NSETP— (Output) An INTEGER indicating the number of solution components not at constraints. The column indices are output in the array INDEX(:).
NSETZ— (Output) An INTEGER indicating the solution components held at fixed values. The column indices are output in the array INDEX(:).
Optional Argument
IOPT(:)— (Input) Assumed-size array of derived type S_OPTIONS or D_OPTIONS. This argument is used to change internal parameters of the algorithm. Normally users will not be concerned about this argument, so they would not include it in the argument list for the routine.
 Packaged Options for PARALLEL_BOUNDED_LSQ Option Name Option Value PBLSQ_SET_TOLERANCE 1 PBLSQ_SET_MAX_ITERATIONS 2 PBLSQ_SET_MIN_RESIDUAL 3
IOPT(IO)=?_OPTIONS(PBLSQ_SET_TOLERANCE, TOLERANCE) Replaces the default rank tolerance for using a column, from EPSILON(TOLERANCE) to TOLERANCE. Increasing the value of TOLERANCE will cause fewer columns to be increased from their constraints, and may cause the minimum residual RNORM to increase.
IOPT(IO)=?_OPTIONS(PBLSQ_SET_MIN_RESIDUAL, RESID) Replaces the default target for the minimum residual vector length from 0 to RESID. Increasing the value of RESID can result in fewer iterations and thus increased efficiency. The descent in the optimization will stop at the first point where the minimum residual RNORM is smaller than RESID. Using this option may result in the dual vector not satisfying its optimality conditions, as noted above.
IOPT(IO)= PBLSQ_SET_MAX_ITERATIONS
IOPT(IO+1)= NEW_MAX_ITERATIONS Replaces the default maximum number of iterations from 3*N to NEW_MAX_ITERATIONS. Note that this option requires two entries in the derived type array.
FORTRAN 90 Interface
Generic: CALL PARALLEL_BOUNDED_LSQ (A, B, X [])
Specific: The specific interface names are S_PARALLEL_BOUNDED_LSQ and D_PARALLEL_BOUNDED_LSQ.
Description
Subroutine PARALLEL_BOUNDED_LSQ solves the least-squares linear system , using the algorithm BVLS found in Lawson and Hanson, (1995), pages 279-283. The new steps involve updating the dual vector and exchange of required data, using MPI. The optional changes to default tolerances, minimum residual, and the number of iterations are new features.
Examples
Example 1: Distributed Equality and Inequality Constraint Solver
The program PBLSQ_EX1 illustrates the computation of the minimum Euclidean length solution of an system of linear inequality constraints , . Additionally the first of the constraints are equalities. The solution algorithm is based on Algorithm LDP, page 165-166, loc. cit. By allowing the dual variables to be free, the constraints become equalities. The rows of are partitioned and assigned random values. When the minimum Euclidean length solution to the inequalities has been calculated, the residuals are computed, with the dual variables to the BVLS problem indicating the entries of that are exactly zero.

PROGRAM PBLSQ_EX1
! Use Parallel_bounded_LSQ to solve an inequality
! constraint problem, Gy >= h. Force F of the constraints
! to be equalities. This algorithm uses LDP of
! Solving Least Squares Problems, page 165.
! Forcing equality constraints by freeing the dual is
! new here. The constraints are allocated to the
! processors, by rows, in columns of the array A(:,:).
USE PBLSQ_INT
USE MPI_SETUP_INT
USE RAND_INT
USE SHOW_INT

IMPLICIT NONE
INCLUDE "mpif.h"

INTEGER, PARAMETER :: MP=500, NP=400, M=NP+1, &
N=MP, F=NP/10

REAL(KIND(1D0)), PARAMETER :: ZERO=0D0, ONE=1D0
REAL(KIND(1D0)), ALLOCATABLE :: &
A(:,:), B(:), BND(:,:), X(:), Y(:), &
W(:), ASAVE(:,:)
REAL(KIND(1D0)) RNORM
INTEGER, ALLOCATABLE :: INDEX(:), IPART(:,:)

INTEGER K, L, DN, J, JSHIFT, IERROR, NSETP, NSETZ
LOGICAL :: PRINT=.false.

! Setup for MPI:
MP_NPROCS=MP_SETUP()

DN=N/max(1,max(1,MP_NPROCS))-1
ALLOCATE(IPART(2,max(1,MP_NPROCS)))

! Spread constraint rows evenly to the processors.
IPART(1,1)=1
DO L=2,MP_NPROCS
IPART(2,L-1)=IPART(1,L-1)+DN
IPART(1,L)=IPART(2,L-1)+1
END DO
IPART(2,MP_NPROCS)=N

! Define the constraints using random data.
K=max(0,IPART(2,MP_RANK+1)-IPART(1,MP_RANK+1)+1)
ALLOCATE(A(M,K), ASAVE(M,K), BND(2,N), &
X(N), W(N), B(M), Y(M), INDEX(N))

! The use of ASAVE can be replaced by regenerating the
! data for A(:,:) after the return from
! Parallel_bounded_LSQ
A=rand(A); ASAVE=A
IF(MP_RANK == 0 .and. PRINT) &
call show(IPART,&
"Partition of the constraints to be solved")

! Set the right-hand side to be one in the last
! component, zero elsewhere.
B=ZERO;B(M)=ONE
! Solve the dual problem. Letting the dual variable
! have no constraint forces an equality constraint
! for the primal problem.
BND(1,1:F)=-HUGE(ONE); BND(1,F+1:)=ZERO
BND(2,:)=HUGE(ONE)
CALL Parallel_bounded_LSQ &
(A, B, BND, X, RNORM, W, INDEX, IPART, &
NSETP, NSETZ)

! Each processor multiplies its block times the part
! of the dual corresponding to that partition.
Y=ZERO
DO J=IPART(1,MP_RANK+1),IPART(2,MP_RANK+1)
JSHIFT=J-IPART(1,MP_RANK+1)+1
Y=Y+ASAVE(:,JSHIFT)*X(J)
END DO

! Accumulate the pieces from all the processors.
! Put sum into B(:) on rank 0 processor.
B=Y
IF(MP_NPROCS > 1) &
CALL MPI_REDUCE(Y, B, M, MPI_DOUBLE_PRECISION,&
MPI_SUM, 0, MP_LIBRARY_WORLD, IERROR)
IF(MP_RANK == 0) THEN

! Compute constraint solution at the root.
! The constraints will have no solution if B(M) = ONE.
! All of these example problems have solutions.
B(M)=B(M)-ONE;B=-B/B(M)
END IF

! Send the inequality constraint or primal solution to all nodes.
IF(MP_NPROCS > 1) &
CALL MPI_BCAST(B, M, MPI_DOUBLE_PRECISION, 0, &
MP_LIBRARY_WORLD, IERROR)

! For large problems this printing may need to be removed.
IF(MP_RANK == 0 .and. PRINT) &
call show(B(1:NP), &
"Minimal length solution of the constraints")

! Compute residuals of the individual constraints.
X=ZERO
DO J=IPART(1,MP_RANK+1),IPART(2,MP_RANK+1)
JSHIFT=J-IPART(1,MP_RANK+1)+1
X(J)=dot_product(B,ASAVE(:,JSHIFT))
END DO

! This cleans up residuals that are about rounding error
! unit (times) the size of the constraint equation and
! right-hand side. They are replaced by exact zero.
WHERE(W == ZERO) X=ZERO
W=X

! Each group of residuals is disjoint, per processor.
! We add all the pieces together for the total set of
! constraints.
IF(MP_NPROCS > 1) &
CALL MPI_REDUCE(X, W, N, MPI_DOUBLE_PRECISION, &
MPI_SUM, 0, MP_LIBRARY_WORLD, IERROR)
IF(MP_RANK == 0 .and. PRINT) &
call show(W, "Residuals for the constraints")

! See to any errors and shut down MPI.
MP_NPROCS=MP_SETUP('Final')
IF(MP_RANK == 0) THEN
IF(COUNT(W < ZERO) == 0 .and.&
COUNT(W == ZERO) >= F) WRITE(*,*)&
" Example 1 for PARALLEL_BOUNDED_LSQ is correct."
END IF
END
Output

Example 1 for PARALLEL_BOUNDED_LSQ is correct.
Example 2: Distributed Newton-Raphson Method with Step Control
The program PBLSQ_EX2 illustrates the computation of the solution of a non-linear system of equations. We use a constrained Newton-Raphson method.
This algorithm works with the problem chosen for illustration. The step-size control used here, employing only simple bounds, may not work on other non-linear systems of equations. Therefore we do not recommend the simple non-linear solving technique illustrated here for an arbitrary problem. The test case is Brown’s Almost Linear Problem, Moré, et al. (1982). The components are given by:
The functions are zero at the point , where is a particular root of the polynomial equation . To avoid convergence to the local minimum , we start at the standard point and develop the Newton method using the linear terms , where is the Jacobian matrix. The update is constrained so that the first components satisfy , or . The last component is bounded from both sides, , or . These bounds avoid the local minimum and allow us to replace the last equation by , which is better scaled than the original. The positive lower bound for is replaced by the strict bound, EPSILON(1D0), the arithmetic precision, which restricts the relative accuracy of . The input for routine PARALLEL_BOUNDED_LSQ expects each processor to obtain that part of it owns. Those columns of the Jacobian matrix correspond to the partition given in the array IPART(:,:). Here the columns of the matrix are evaluated, in parallel, on the nodes where they are required.

PROGRAM PBLSQ_EX2
! Use Parallel_bounded_LSQ to solve a non-linear system
! of equations. The example is an ACM-TOMS test problem,
! except for the larger size. It is "Brown's Almost Linear
! Function."
USE ERROR_OPTION_PACKET
USE PBLSQ_INT
USE MPI_SETUP_INT
USE SHOW_INT
USE Numerical_Libraries, ONLY : N1RTY

IMPLICIT NONE
INTEGER, PARAMETER :: N=200, MAXIT=5

REAL(KIND(1D0)), PARAMETER :: ZERO=0D0, ONE=1D0,&
HALF=5D-1, TWO=2D0
REAL(KIND(1D0)), ALLOCATABLE :: &
A(:,:), B(:), BND(:,:), X(:), Y(:), W(:)
REAL(KIND(1D0)) RNORM
INTEGER, ALLOCATABLE :: INDEX(:), IPART(:,:)

INTEGER K, L, DN, J, JSHIFT, IERROR, NSETP, &
NSETZ, ITER
LOGICAL :: PRINT=.false.
TYPE(D_OPTIONS) IOPT(3)

! Setup for MPI:
MP_NPROCS=MP_SETUP()

DN=N/max(1,max(1,MP_NPROCS))-1
ALLOCATE(IPART(2,max(1,MP_NPROCS)))

! Spread Jacobian matrix columns evenly to the processors.
IPART(1,1)=1
DO L=2,MP_NPROCS
IPART(2,L-1)=IPART(1,L-1)+DN
IPART(1,L)=IPART(2,L-1)+1
END DO
IPART(2,MP_NPROCS)=N

K=max(0,IPART(2,MP_RANK+1)-IPART(1,MP_RANK+1)+1)
ALLOCATE(A(N,K), BND(2,N), &
X(N), W(N), B(N), Y(N), INDEX(N))

! This is Newton's method on "Brown's almost
! linear function."
X=HALF
ITER=0

! Turn off messages and stopping for FATAL class errors.
CALL ERSET (4, 0, 0)

NEWTON_METHOD: DO

! Set bounds for the values after the step is taken.
! All variables are positive and bounded below by HALF,
! except for variable N, which has an upper bound of HALF.
BND(1,1:N-1)=-HUGE(ONE)
BND(2,1:N-1)=X(1:N-1)-HALF
BND(1,N)=X(N)-HALF
BND(2,N)=X(N)-EPSILON(ONE)

! Compute the residual function.
B(1:N-1)=SUM(X)+X(1:N-1)-(N+1)
B(N)=LOG(PRODUCT(X))
if(mp_rank == 0 .and. PRINT) THEN
CALL SHOW(B, &
"Developing non-linear function residual")
END IF
IF (MAXVAL(ABS(B(1:N-1))) <= SQRT(EPSILON(ONE)))&
EXIT NEWTON_METHOD

! Compute the derivatives local to each processor.
A(1:N-1,:)=ONE
DO J=1,N-1
IF(J < IPART(1,MP_RANK+1)) CYCLE
IF(J > IPART(2,MP_RANK+1)) CYCLE
JSHIFT=J-IPART(1,MP_RANK+1)+1
A(J,JSHIFT)=TWO
END DO
A(N,:)=ONE/X(IPART(1,MP_RANK+1):IPART(2,MP_RANK+1))

! Reset the linear independence tolerance.
IOPT(1)=D_OPTIONS(PBLSQ_SET_TOLERANCE,&
sqrt(EPSILON(ONE)))
IOPT(2)=PBLSQ_SET_MAX_ITERATIONS

! If N iterations was not enough on a previous iteration, reset to 2*N.
IF(N1RTY(1) == 0) THEN
IOPT(3)=N
ELSE
IOPT(3)=2*N
CALL E1POP('MP_SETUP')
CALL E1PSH('MP_SETUP')
END IF

CALL parallel_bounded_LSQ &
(A, B, BND, Y, RNORM, W, INDEX, IPART, NSETP, &
NSETZ,IOPT=IOPT)

! The array Y(:) contains the constrained Newton step.
! Update the variables.
X=X-Y
IF(mp_rank == 0 .and. PRINT) THEN
CALL show(BND, "Bounds for the moves")
CALL SHOW(X, "Developing Solution")
CALL SHOW((/RNORM/), &
"Linear problem residual norm")
END IF

! This is a safety measure for not taking too many steps.
ITER=ITER+1
IF(ITER > MAXIT) EXIT NEWTON_METHOD
END DO NEWTON_METHOD

IF(MP_RANK == 0) THEN
IF(ITER <= MAXIT) WRITE(*,*)&
" Example 2 for PARALLEL_BOUNDED_LSQ is correct."
END IF

! See to any errors and shut down MPI.
MP_NPROCS=MP_SETUP('Final')
END