nonlinearOptimization¶

Fits data to a nonlinear model (possibly with linear constraints) using the successive quadratic programming algorithm (applied to the sum of squared errors, $sse=\Sigma(y_i-f(x_i; \theta))^2$ ) and either a finite difference gradient or a user-supplied gradient.

Synopsis¶

nonlinearOptimization (fcn, nParameters, x, y)

Required Arguments¶

float fcn (nIndependent, xi, nParameters, theta): User-supplied function to evaluate the function that defines the nonlinear regression problem where xi is an array of length nIndependent at which point the function is evaluated and theta is an array of length nParameters containing the current values of the regression coefficients. Function fcn returns a predicted value at the point xi. In the following, $f (x_i; \theta)$ , or just $f_i$ , denotes the value of this function at the point $x_i$ , for a given value of θ. (Both $x_i$ and θ are arrays.)
int nParameters (Input): Number of parameters to be estimated.
float x[[]] (Input): Array of size nObservations by nIndependent containing the matrix of independent (explanatory) variables.
float y[] (Input): Array of length nObservations containing the dependent (response) variable.

Return Value¶

An array of length nParameters containing a solution, $\hat{\theta}$ for the nonlinear regression coefficients. If no solution can be computed, then None is returned.

Optional Arguments¶

thetaGuess, float[] (Input)

Array with nParameters components containing an initial guess.

Default: thetaGuess[] = 0

jacobian, ( nIndependent, xi, nParameters, theta, fjac) (Input/Output)

User-supplied function to compute the i‑th row of the Jacobian, where the nIndependent data values corresponding to the i‑th row are input in xi. Argument theta is an array of length nParameters containing the regression coefficients for which the Jacobian is evaluated, fjac is the computed nParameters row of the Jacobian for observation i at theta. Note that each derivative

$\partial f(x_i)/\partial\theta _j$ should be returned in fjac[j-1] for

$j=1,2,\ldots$ , nParameters. Further note that in order to maintain consistency with the other nonlinear solver, nonlinearRegression, the Jacobian values must be specified as the negative of the calculated derivatives.

simpleLowerBounds, float[] (Input)

Vector of length nParameters containing the lower bounds on the parameters; choose a very large negative value if a component should be unbounded below or set simpleLowerBounds[i] = simpleUpperBounds[i] to freeze the i‑th variable.

Default: All parameters are bounded below by $-10^6$ .

simpleUpperBounds, float[] (Input)

Vector of length nParameters containing the upper bounds on the parameters; choose a very large value if a component should be unbounded above or set simpleLowerBounds[i] = simpleUpperBounds[i] to freeze the i‑th variable.

Default: All parameters are bounded above by $10^6$ .

linearConstraints, int nEquality, float a[], float b[] (Input)

Argument nConstraints is the total number of linear constraints (excluding simple bounds). Argument nEquality is the number of these constraints which are equality constraints; the remaining nConstraints − nEquality constraints are inequality constraints. Argument a is a nConstraints by nParameters array containing the equality constraint gradients in the first nEquality rows, followed by the inequality constraint gradients. Argument b is a vector of length nConstraints containing the right-hand sides of the linear constraints.

Specifically, the constraints on θ are:

$a_{i1} \theta_1+\ldots+a_{ij} \theta_j=b_i$ for $i=1$ , nEquality and $j=1$ , n_parameter, and

$a_{k1} \theta_1+\ldots+a_{kj} \theta_j\leq b_k$ for k = nEquality + 1, nConstraints and $j=1$ , n_parameter.

Default: There are no default linear constraints.

frequencies, float[] (Input)

Array of length nObservations containing the frequency for each observation.

Default: frequencies[] = 1

weights, float[] (Input)

Array of length nObservations containing the weight for each observation.

Default: weights[] = 1

acc, float (Input)

The nonnegative tolerance on the first order conditions at the calculated solution.

maxSseEvaluations, int (Input/Output)

On input maxSseEvaluations is the maximum number of sse evaluations allowed. On output, maxSseEvaluations contains the actual number of sse evaluations needed.

Default: maxSseEvaluations = 400

printLevel, int (Input)

Argument printLevel specifies the frequency of printing during execution. If printLevel = 0, there is no printing. Otherwise, after ensuring feasibility, information is printed every printLevel iterations and whenever an internal tolerance (called tol) is reduced. The printing provides the values of theta and the sse and gradient at the value of theta. If printLevel is negative, this information is augmented by the current values of indicesActive, multiplier, and reskt, where reskt is the Kuhn-Tucker residual vector at theta.

stopInfo (Output)

Argument stopInfo will have one of the following integer values to indicate the reason for leaving the routine:

`stopInfo`	Reason for leaving routine
1	θ is feasible, and the condition that depends on `acc` is satisfied.
2	θ is feasible, and rounding errors are preventing further progress.
3	θ is feasible, but sse fails to decrease although a decrease is predicted by the current gradient vector.
4	The calculation cannot begin because `a` contains fewer than `nConstraints` constraints or because the lower bound on a variable is greater than the upper bound.
5	The equality constraints are inconsistent. These constraints include any components of $\hat{\theta}$ that are frozen by setting `simpleLowerBounds`[`i`] equal to `simpleUpperBounds`[`i`].
6	The equality constraints and the bound on the variables are found to be inconsistent.
7	There is no possible θ that satisfies all of the constraints.
8	Maximum number of sse evaluations (`maxSseEvaluations`) is exceeded.
9	θ is determined by the equality constraints.

activeConstraintsInfo, nActive, indicesActive, multiplier (Output): Argument nActive returns the final number of active constraints. Argument indicesActive is an integer array of length nActive containing the indices of the final active constraints. Argument multiplier is an real array of length nActive containing the Lagrange multiplier estimates of the final active constraints.
predicted (Output): A real array of length nObservations containing the predicted values at the approximate solution.
residual (Output): A real array of length nObservations containing the residuals at the approximate solution.
sse (Output): Residual sum of squares.

Description¶

Function nonlinearOptimization is based on M.J.D. Powell’s TOLMIN, which solves linearly constrained optimization problems, i.e., problems of the form min f(θ), θ ∈ ℜ, subject to

$A_1θ = b_1$

$A_2θ ≤ b_2$

$θ_I ≤ θ ≤ θ_u$

given the vectors $b_1$ , $b_2$ , $\theta_I$ , and $\theta_u$ and the matrices $A_1$ and $A_2$ .

The algorithm starts by checking the equality constraints for inconsistency and redundancy. If the equality constraints are consistent, the method will revise $\theta^0$ , the initial guess provided by the user, to satisfy

$A_1θ = b_1$

Next, $\theta^0$ is adjusted to satisfy the simple bounds and inequality constraints. This is done by solving a sequence of quadratic programming subproblems to minimize the sum of the constraint or bound violations.

Now, for each iteration with a feasible $\theta^k$ , let $J_k$ be the set of indices of inequality constraints that have small residuals. Here, the simple bounds are treated as inequality constraints. Let $I_k$ be the set of indices of active constraints. The following quadratic programming problem

$\min f\left(\theta^k\right) + d^T \nabla f\left(\theta^k\right) + \tfrac{1}{2} d^T B^k d$

subject to

$a_jd = 0 \phantom{...} j ∈ I_k$

$a_jd ≤ 0 \phantom{...} j ∈ J_k$

is solved to get $(d^k,\lambda^k)$ where $a_j$ is a row vector representing either a constraint in $A_1$ or $A_2$ or a bound constraint on θ. In the latter case, the $a_j=e_i$ for the bound constraint $\theta_i\leq(\theta_u)_i$ and $a_j=-e_i$ for the constraint $\theta_i\leq(\theta_l)_i$ . Here, $e_i$ is a vector with a 1 as the i‑th component, and zeroes elsewhere. $\lambda^k$ are the Lagrange multipliers, and $B^k$ is a positive definite approximation to the second derivative $\nabla^2f(\theta^k)$ .

After the search direction $d^k$ is obtained, a line search is performed to locate a better point. The new point $\theta^{k+1}=\theta^k+\alpha^k d^k$ has to satisfy the conditions

$f (θ^k + α^kd^k) ≤ f (θ^k) + 0.1α^k (d^k)^T∇ f (θ^k)$

and

$(d^k)^T∇ f (θ^k + α^kd^k) ≥ 0.7 (d^k)^T∇ f (θ^k)$

The main idea in forming the set $J_k$ is that, if any of the inequality constraints restricts the step-length $\alpha^k$ , then its index is not in $J_k$ . Therefore, small steps are likely to be avoided.

Finally, the second derivative approximation, $B^k$ , is updated by the BFGS formula, if the condition

$(d^k)^T∇ f (θ^k + α^kd^k) − ∇ f (θ^k) > 0$

holds. Let $\theta^k\leftarrow\theta ^{k+1}$ , and start another iteration.

The iteration repeats until the stopping criterion

$∥∇ f (θ^k) − A^kλ^k∥_2 ≤ τ$

is satisfied; here, τ is a user-supplied tolerance. For more details, see Powell (1988, 1989).

Since a finite-difference method is used to estimate the gradient for some single precision calculations, an inaccurate estimate of the gradient may cause the algorithm to terminate at a noncritical point. In such cases, high precision arithmetic is recommended. Also, whenever the exact gradient can be easily provided, the gradient should be passed to nonlinearOptimization using the optional argument jacobian.

Examples¶

Example 1¶

In this example, a data set is fitted to the nonlinear model function

$y_i = \sin \left(\theta_0 x_i\right) + \varepsilon_i$

from __future__ import print_function
from numpy import *
from pyimsl.stat.nonlinearOptimization import nonlinearOptimization
from pyimsl.stat.writeMatrix import writeMatrix


def fcn_nonlin(n_independent, x, nParameters, theta):
    return sin(theta[0] * x[0])


n_parameters = 1
n_observations = 11
n_independent = 1
x = array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
y = array([0.05, 0.21, 0.67, 0.72, 0.98, 0.94, 1.00, 0.73, 0.44, 0.36, 0.02])

theta_hat = nonlinearOptimization(fcn_nonlin, n_parameters, x, y)

writeMatrix("Theta Hat", theta_hat)

Output¶

 
 Theta Hat
      3.161

Example 2¶

Draper and Smith (1981, p. 475) state a problem due to Smith and Dubey. [H. Smith and S. D. Dubey (1964), “Some reliability problems in the chemical industry”, Industrial Quality Control, 21 (2), 1964, pp. 64−70] A certain product must have 50% available chlorine at the time of manufacture. When it reaches the customer 8 weeks later, the level of available chlorine has dropped to 49%. It was known that the level should stabilize at about 30%. To predict how long the chemical would last at the customer site, samples were analyzed at different times. It was postulated that the following nonlinear model should fit the data.

$y_i = \theta_0 + \left(0.49 - \theta_0\right) e ^{-\theta_1\left(x_i-8\right)} + \varepsilon_i$

Since the chlorine level will stabilize at about 30%, the initial guess for theta1 is 0.30. Using the last data point ( $x=42$ , $y=0.39$ ) and $\theta_0=0.30$ and the above nonlinear equation, an estimate for $\theta_1$ of 0.02 is obtained.

The constraints that $\theta_0\geq=0$ and $\theta_1\geq=0$ are also imposed. These are equivalent to requiring that the level of available chlorine always be positive and never increase with time.

The Jacobian of the nonlinear model equation is also used.

from __future__ import print_function
from numpy import *
from pyimsl.stat.nonlinearOptimization import nonlinearOptimization
from pyimsl.stat.writeMatrix import writeMatrix


def fcn(n_independent, x, nParameters, theta):
    return theta[0] + (0.49 - theta[0]) * exp(-theta[1] * (x[0] - 8.0))


def jacobian(n_independent, x, n_parameters, theta, fjac):
    fjac[0] = -1.0 + exp(-theta[1] * (x[0] - 8.0))
    fjac[1] = (0.49 - theta[0]) * (x[0] - 8.0) * exp(-theta[1] * (x[0] - 8.0))


n_parameters = 2
n_observations = 44
n_independent = 1
x = [8.0, 8.0, 10.0, 10.0, 10.0,
     10.0, 12.0, 12.0, 12.0, 12.0,
     14.0, 14.0, 14.0, 16.0, 16.0,
     16.0, 18.0, 18.0, 20.0, 20.0,
     20.0, 22.0, 22.0, 22.0, 24.0,
     24.0, 24.0, 26.0, 26.0, 26.0,
     28.0, 28.0, 30.0, 30.0, 30.0,
     32.0, 32.0, 34.0, 36.0, 36.0,
     38.0, 38.0, 40.0, 42.0]
y = [.49, .49, .48, .47, .48,
     .47, .46, .46, .45, .43,
     .45, .43, .43, .44, .43,
     .43, .46, .45, .42, .42,
     .43, .41, .41, .4, .42,
     .4, .4, .41, .4, .41,
     .41, .4, .4, .4, .38,
     .41, .4, .4, .41, .38,
     .4, .4, .39, .39]
guess = [0.30, 0.02]
xlb = [0.0, 0.0]
sse = []

theta_hat = nonlinearOptimization(fcn, n_parameters,
                                  x, y, thetaGuess=guess, simpleLowerBounds=xlb,
                                  jacobian=jacobian, sse=sse)

writeMatrix("Theta Hat", theta_hat)

Output¶

 
        Theta Hat
          1            2
     0.3901       0.1016

Fatal Errors¶

`IMSLS_BAD_CONSTRAINTS_1`	The equality constraints are inconsistent.
`IMSLS_BAD_CONSTRAINTS_2`	The equality constraints and the bounds on the variables are found to be inconsistent.
`IMSLS_BAD_CONSTRAINTS_3`	No vector “theta” satisfies all of the constraints. Specifically, the current active constraints prevent any change in “theta” that reduces the sum of constraint violations.
`IMSLS_BAD_CONSTRAINTS_4`	The variables are determined by the equality constraints.
`IMSLS_TOO_MANY_ITERATIONS_1`	Number of function evaluations exceeded “maxFcn” = #.
`IMSLS_STOP_USER_FCN`	Request from user supplied function to stop algorithm. User flag = “#”.