PARALLEL_BOUNDED_LSQ

For a detailed description of MPI Requirements see “Dense Matrix Parallelism Using MPI” in Chapter 10 of this manual.  

Solves a linear least-squares system with bounds on the unknowns.

Usage Notes

CALL PARALLEL_BOUNDED_LSQ & (A, B, BND, X, RNORM, W, INDEX, IPART, & NSETP, NSETZ, IOPT=IOPT)

Required Arguments

A(1:M,:)— (Input/Output)  Columns of the matrix with limits given by entries in the array IPART(1:2,1:max(1,MP_NPROCS)).  On output  is replaced by the product , where is an orthogonal matrix.  The value SIZE(A,1) defines the value of M.  Each processor starts and exits with its piece of the partitioned matrix.

B(1:M) — (Input/Output)  Assumed-size array of length M containing the right-hand side vector, . On output  is replaced by the product , where is the orthogonal matrix applied to and  is a set of active bounds for the solution.  All processors in the communicator start and exit with the same vector.

BND(1:2,1:N) — (Input)  Assumed-size array containing the bounds for .  The lower bound  is in BND(1,J), and the upper bound  is in BND(2,J).

X(1:N) — (Output)  Assumed-size array of length N containing the solution, .  The value SIZE(X) defines the value of  N.  All processors exit with the same vector.

RNORM — (Output) Scalar that contains the Euclidean or least-squares length of the residual vector, .  All processors exit with the same value.

W(1:N) — (Output)  Assumed-size array of length N containing the dual vector, .  At a solution exactly one of the following is true for each

All processors exit with the same vector.

INDEX(1:N) — (Output)  Assumed-size array of length N containing the NSETP indices of columns in the solution interior to bounds, and the remainder that are at a constraint. All processors exit with the same array.

IPART(1:2,1:max(1,MP_NPROCS)) — (Input)  Assumed-size array containing the partitioning describing the matrix.  The value MP_NPROCS is the number of processors in the communicator, except when MPI has been finalized with a call to the routine MP_SETUP(‘Final').  This causes MP_NPROCS to be assigned 0.   Normally users will give the partitioning to processor of rank = MP_RANK by setting IPART(1,MP_RANK+1)= first column index, and IPART(2,MP_RANK+1)= last column index.   The number of columns per node is typically based on their relative computing power.  To avoid a node with rank MP_RANK doing any work except communication, set IPART(1,MP_RANK+1) = 0 and IPART(2,MP_RANK+1)= -1.  In this exceptional case there is no reference to the array A(:,:) at that node.

NSETP— (Output) An INTEGER indicating the number of solution  components not at constraints.  The column indices are output in the array INDEX(:).

NSETZ— (Output) An INTEGER indicating the solution  components held at fixed values.  The column indices are output in the array INDEX(:).

Optional Argument

IOPT(:)— (Input)  Assumed-size array of derived type S_OPTIONS or D_OPTIONS.  This argument is used to change internal parameters of the algorithm.  Normally users will not be concerned about this argument, so they would not include it in the argument list for the routine. 

 

Packaged Options for PARALLEL_BOUNDED_LSQ

Option Name

Option Value

PBLSQ_SET_TOLERANCE

1

PBLSQ_SET_MAX_ITERATIONS

2

PBLSQ_SET_MIN_RESIDUAL

3

 

IOPT(IO)=?_OPTIONS(PBLSQ_SET_TOLERANCE, TOLERANCE) Replaces the default rank tolerance for using a column, from EPSILON(TOLERANCE) to TOLERANCE.  Increasing the value of TOLERANCE will cause fewer columns to be increased from their constraints, and may  cause the minimum residual RNORM to increase.

IOPT(IO)=?_OPTIONS(PBLSQ_SET_MIN_RESIDUAL, RESID) Replaces the default target for the minimum residual vector length from 0 to RESID.  Increasing the value of RESID can result in fewer iterations and thus increased efficiency. The descent in the optimization will stop at the first point where the minimum residual RNORM is smaller than RESID.  Using this option may result in the dual vector not satisfying its optimality conditions, as noted above.

IOPT(IO)= PBLSQ_SET_MAX_ITERATIONS

IOPT(IO+1)= NEW_MAX_ITERATIONS Replaces the default maximum number of iterations from 3*N to NEW_MAX_ITERATIONSNote that this option requires two entries in the derived type array.

FORTRAN 90 Interface

Generic:                              CALL PARALLEL_BOUNDED_LSQ (A, B, X [,…])

Specific:         The specific interface names are S_PARALLEL_BOUNDED_LSQ and D_PARALLEL_BOUNDED_LSQ.

Description

Subroutine PARALLEL_BOUNDED_LSQ solves the least-squares linear system , using the algorithm BVLS found in Lawson and Hanson, (1995), pages 279-283.  The new steps involve updating the dual vector and exchange of required data, using MPI.  The optional changes to default tolerances, minimum residual, and the number of iterations are new features.

Example 1: Distributed Equality and Inequality Constraint Solver

The program PBLSQ_EX1 illustrates the computation of the minimum Euclidean length solution of an  system of linear inequality constraints , .  Additionally the first  of the constraints are equalities.  The solution algorithm is based on Algorithm LDP, page 165-166, loc. cit.  By allowing the dual variables to be free,  the constraints become equalities.  The rows of are partitioned and assigned random values.  When the minimum Euclidean length solution to the inequalities has been calculated, the residuals  are computed, with the dual variables to the BVLS problem indicating  the entries of   that are exactly zero.

 

      PROGRAM PBLSQ_EX1

! Use Parallel_bounded_LSQ to solve an inequality

! constraint problem, Gy >= h. Force F of the constraints

! to be equalities. This algorithm uses LDP of

! Solving Least Squares Problems, page 165.

! Forcing equality constraints by freeing the dual is

! new here. The constraints are allocated to the

! processors, by rows, in columns of the array A(:,:).

        USE PBLSQ_INT

        USE MPI_SETUP_INT

        USE RAND_INT

        USE SHOW_INT

 

        IMPLICIT NONE

        INCLUDE "mpif.h"

 

        INTEGER, PARAMETER :: MP=500, NP=400, M=NP+1, &

          N=MP, F=NP/10

 

        REAL(KIND(1D0)), PARAMETER :: ZERO=0D0, ONE=1D0

        REAL(KIND(1D0)), ALLOCATABLE :: &

         A(:,:), B(:), BND(:,:), X(:), Y(:), &

          W(:), ASAVE(:,:)

        REAL(KIND(1D0)) RNORM

        INTEGER, ALLOCATABLE :: INDEX(:), IPART(:,:)

 

        INTEGER K, L, DN, J, JSHIFT, IERROR, NSETP, NSETZ

        LOGICAL :: PRINT=.false.

 

! Setup for MPI:

        MP_NPROCS=MP_SETUP()

 

        DN=N/max(1,max(1,MP_NPROCS))-1

        ALLOCATE(IPART(2,max(1,MP_NPROCS)))

 

! Spread constraint rows evenly to the processors.

        IPART(1,1)=1

        DO L=2,MP_NPROCS

           IPART(2,L-1)=IPART(1,L-1)+DN

           IPART(1,L)=IPART(2,L-1)+1

        END DO

        IPART(2,MP_NPROCS)=N

 

! Define the constraints using random data.

        K=max(0,IPART(2,MP_RANK+1)-IPART(1,MP_RANK+1)+1)

        ALLOCATE(A(M,K), ASAVE(M,K), BND(2,N), &

          X(N), W(N), B(M), Y(M), INDEX(N))

 

! The use of ASAVE can be replaced by regenerating the

! data for A(:,:) after the return from

! Parallel_bounded_LSQ

        A=rand(A); ASAVE=A

        IF(MP_RANK == 0 .and. PRINT) &

          call show(IPART,&

            "Partition of the constraints to be solved")

 

! Set the right-hand side to be one in the last

! component, zero elsewhere.

        B=ZERO;B(M)=ONE

       

! Solve the dual problem. Letting the dual variable

! have no constraint forces an equality constraint

! for the primal problem.

        BND(1,1:F)=-HUGE(ONE); BND(1,F+1:)=ZERO

        BND(2,:)=HUGE(ONE)

        CALL Parallel_bounded_LSQ &

          (A, B, BND, X, RNORM, W, INDEX, IPART, &

            NSETP, NSETZ)

 

! Each processor multiplies its block times the part

! of the dual corresponding to that partition.

        Y=ZERO

        DO J=IPART(1,MP_RANK+1),IPART(2,MP_RANK+1)

           JSHIFT=J-IPART(1,MP_RANK+1)+1

           Y=Y+ASAVE(:,JSHIFT)*X(J)

        END DO

 

! Accumulate the pieces from all the processors.

! Put sum into B(:) on rank 0 processor.

        B=Y

        IF(MP_NPROCS > 1) &

          CALL MPI_REDUCE(Y, B, M, MPI_DOUBLE_PRECISION,&

           MPI_SUM, 0, MP_LIBRARY_WORLD, IERROR)

        IF(MP_RANK == 0) THEN

 

! Compute constraint solution at the root.

! The constraints will have no solution if B(M) = ONE.

! All of these example problems have solutions.

           B(M)=B(M)-ONE;B=-B/B(M)

        END IF

 

! Send the inequality constraint or primal solution to all nodes.

  IF(MP_NPROCS > 1) &

    CALL MPI_BCAST(B, M, MPI_DOUBLE_PRECISION, 0, &

      MP_LIBRARY_WORLD, IERROR)

 

! For large problems this printing may need to be removed.

        IF(MP_RANK == 0 .and. PRINT) &

          call show(B(1:NP), &

            "Minimal length solution of the constraints")

 

! Compute residuals of the individual constraints.

        X=ZERO

        DO J=IPART(1,MP_RANK+1),IPART(2,MP_RANK+1)

           JSHIFT=J-IPART(1,MP_RANK+1)+1

           X(J)=dot_product(B,ASAVE(:,JSHIFT))

        END DO

 

! This cleans up residuals that are about rounding error

! unit (times) the size of the constraint equation and

! right-hand side.  They are replaced by exact zero.

        WHERE(W == ZERO) X=ZERO

        W=X

 

! Each group of residuals is disjoint, per processor.

! We add all the pieces together for the total set of

! constraints.

      IF(MP_NPROCS > 1) &

        CALL MPI_REDUCE(X, W, N, MPI_DOUBLE_PRECISION, &

          MPI_SUM, 0, MP_LIBRARY_WORLD, IERROR)

        IF(MP_RANK == 0 .and. PRINT) &

          call show(W, "Residuals for the constraints")

 

! See to any errors and shut down MPI.

        MP_NPROCS=MP_SETUP('Final')

        IF(MP_RANK == 0) THEN

          IF(COUNT(W < ZERO) == 0 .and.&

           COUNT(W == ZERO) >= F) WRITE(*,*)&

            " Example 1 for PARALLEL_BOUNDED_LSQ is correct."

        END IF

     END

Output

 

Example 1 for PARALLEL_BOUNDED_LSQ is correct.

Additional Examples

Example 2: Distributed Newton-Raphson Method with Step Control

The program PBLSQ_EX2 illustrates the computation of the solution of a non-linear system of equations.  We use a constrained Newton-Raphson method. 
This algorithm works with the problem chosen for illustration.  The step-size control used here, employing only simple bounds, may not work on other non-linear systems of equations.  Therefore we do not recommend the simple non-linear solving technique illustrated here for an arbitrary problem.  The test case is Brown's Almost Linear Problem, Morι, et al. (1982).  The components are given by:

The functions are zero at the point, where  is a particular root of the polynomial equation .   To avoid convergence to the local minimum , we start at the standard point and develop the Newton method using the linear terms , where is the Jacobian matrix.  The update is constrained so that the first  components satisfy , or .  The last component is bounded from both sides, ,  or .  These bounds avoid the local minimum and allow us to replace the last equation by  , which is better scaled than the original.   The positive lower bound for  is replaced by the strict bound, EPSILON(1D0), the arithmetic precision, which restricts the relative accuracy of .  The input for routine PARALLEL_BOUNDED_LSQ expects each processor to obtain that part of  it owns.  Those columns of the Jacobian matrix correspond to the partition given in the array IPART(:,:).  Here the columns of the matrix are evaluated, in parallel, on the nodes where they are required.

 

      PROGRAM PBLSQ_EX2

! Use Parallel_bounded_LSQ to solve a non-linear system

! of equations. The example is an ACM-TOMS test problem,

! except for the larger size.  It is "Brown's Almost Linear

! Function."

        USE ERROR_OPTION_PACKET

        USE PBLSQ_INT

        USE MPI_SETUP_INT

        USE SHOW_INT

        USE Numerical_Libraries, ONLY : N1RTY

 

        IMPLICIT NONE

       

        INTEGER, PARAMETER :: N=200, MAXIT=5

 

        REAL(KIND(1D0)), PARAMETER :: ZERO=0D0, ONE=1D0,&

          HALF=5D-1, TWO=2D0

        REAL(KIND(1D0)), ALLOCATABLE :: &

         A(:,:), B(:), BND(:,:), X(:), Y(:), W(:)

        REAL(KIND(1D0)) RNORM

        INTEGER, ALLOCATABLE :: INDEX(:), IPART(:,:)

 

        INTEGER K, L, DN, J, JSHIFT, IERROR, NSETP, &

          NSETZ, ITER

        LOGICAL :: PRINT=.false.

        TYPE(D_OPTIONS) IOPT(3)

 

! Setup for MPI:

        MP_NPROCS=MP_SETUP()

 

        DN=N/max(1,max(1,MP_NPROCS))-1

        ALLOCATE(IPART(2,max(1,MP_NPROCS)))

 

! Spread Jacobian matrix columns evenly to the processors.

        IPART(1,1)=1

        DO L=2,MP_NPROCS

           IPART(2,L-1)=IPART(1,L-1)+DN

           IPART(1,L)=IPART(2,L-1)+1

        END DO

        IPART(2,MP_NPROCS)=N

 

        K=max(0,IPART(2,MP_RANK+1)-IPART(1,MP_RANK+1)+1)

        ALLOCATE(A(N,K), BND(2,N), &

          X(N), W(N), B(N), Y(N), INDEX(N))

 

! This is Newton's method on "Brown's almost

! linear function."

        X=HALF

       ITER=0

 

! Turn off messages and stopping for FATAL class errors.

        CALL ERSET (4, 0, 0)            

 

NEWTON_METHOD: DO

 

! Set bounds for the values after the step is taken.

! All variables are positive and bounded below by HALF,

! except for variable N, which has an upper bound of HALF.

        BND(1,1:N-1)=-HUGE(ONE)

        BND(2,1:N-1)=X(1:N-1)-HALF

       BND(1,N)=X(N)-HALF

        BND(2,N)=X(N)-EPSILON(ONE)

 

! Compute the residual function.

        B(1:N-1)=SUM(X)+X(1:N-1)-(N+1)

       B(N)=LOG(PRODUCT(X))

       if(mp_rank == 0 .and. PRINT) THEN

         CALL SHOW(B, &

            "Developing non-linear function residual")

       END IF

        IF (MAXVAL(ABS(B(1:N-1))) <= SQRT(EPSILON(ONE)))&

          EXIT NEWTON_METHOD

             

! Compute the derivatives local to each processor.

        A(1:N-1,:)=ONE

        DO J=1,N-1

          IF(J < IPART(1,MP_RANK+1)) CYCLE

          IF(J > IPART(2,MP_RANK+1)) CYCLE

         JSHIFT=J-IPART(1,MP_RANK+1)+1

         A(J,JSHIFT)=TWO

       END DO

        A(N,:)=ONE/X(IPART(1,MP_RANK+1):IPART(2,MP_RANK+1))

 

! Reset the linear independence tolerance.

        IOPT(1)=D_OPTIONS(PBLSQ_SET_TOLERANCE,&

          sqrt(EPSILON(ONE)))

       IOPT(2)=PBLSQ_SET_MAX_ITERATIONS

 

! If N iterations was not enough on a previous iteration, reset to 2*N.

       IF(N1RTY(1) == 0) THEN

         IOPT(3)=N

        ELSE

          IOPT(3)=2*N

         CALL E1POP('MP_SETUP')

          CALL E1PSH('MP_SETUP')

        END IF

               

        CALL parallel_bounded_LSQ &

          (A, B, BND, Y, RNORM, W, INDEX, IPART, NSETP, &

            NSETZ,IOPT=IOPT)

 

! The array Y(:) contains the constrained Newton step. 

! Update the variables.

        X=X-Y

                    

              IF(mp_rank == 0 .and. PRINT) THEN

          CALL show(BND, "Bounds for the moves")

          CALL SHOW(X, "Developing Solution")

          CALL SHOW((/RNORM/), &

            "Linear problem residual norm")

        END IF

                

! This is a safety measure for not taking too many steps.                 

       ITER=ITER+1

       IF(ITER > MAXIT) EXIT NEWTON_METHOD

      END DO NEWTON_METHOD

 

      IF(MP_RANK == 0) THEN

        IF(ITER <= MAXIT) WRITE(*,*)&

        " Example 2 for PARALLEL_BOUNDED_LSQ is correct."

      END IF

 

! See to any errors and shut down MPI.

        MP_NPROCS=MP_SETUP('Final')

        

     END


Visual Numerics, Inc.
Visual Numerics - Developers of IMSL and PV-WAVE
http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260