nonlinearRegression

../../_images/OpenMp_27.png

Fits a multivariate nonlinear regression model.

Synopsis

nonlinearRegression (fcn, nParameters, x, y)

Required Arguments

float fcn (nIndependent, xi, nParameters, theta)
User-supplied function to evaluate the function that defines the nonlinear regression problem where xi is an array of length nIndependent at which point the function is evaluated and theta is an array of length nParameters containing the current values of the regression coefficients. Function fcn returns a predicted value at the point xi. In the following, \(f(x_i;\theta)\), or just \(f_i\), denotes the value of this function at the point \(x_i\), for a given value of θ. (Both \(x_i\) and θ are arrays.)
int nParameters (Input)
Number of parameters to be estimated.
float x[[]] (Input)
Array of size nObservations by nIndependent containing the matrix of independent (explanatory) variables.
float y[] (Input)
Array of length nObservations containing the dependent (response) variable.

Return Value

An array of length nParameters containing a solution, \(\hat{\theta}\) for the nonlinear regression coefficients. If no solution can be computed, then None is returned.

Optional Arguments

thetaGuess, float[] (Input)

Array with nParameters components containing an initial guess.

Default: thetaGuess[] = 0.

jacobian, (nIndependent, xi, nParameters, theta, fjac) (Input/Output)
User-supplied function to compute the i‑th row of the Jacobian, where the nIndependent data values corresponding to the i‑th row are input in xi. Argument theta is an array of length nParameters containing the regression coefficients for which the Jacobian is evaluated, fjac is the computed nParameters row of the Jacobian for observation i at theta. Note that each derivative \(\partial f(x_i)/\partial\theta _j\) should be returned in fjac [j − 1] for \(j=1,2,\ldots\), nParameters.
thetaScale, float[] (Input)

Array with nParameters components containing the scaling array for θ. Array thetaScale is used mainly in scaling the gradient and the distance between two points. See keywords gradientEps and stepEps for more details.

Default: thetaScale[] = 1.

gradientEps, float (Input)

Scaled gradient tolerance. The j-th component of the scaled gradient at θ is calculated as

\[\frac{\left|g_j\right| * \max\left(\left|\theta_j\right|, 1 / t_j\right)} {\tfrac{1}{2} \|F(\theta)\|_2^2}\]

where \(g=\nabla F(\theta)\), t = thetaScale, and

\[\|F(\theta)\|_2^2 = \sum\nolimits_{i=1}^{n} \left(y_i - f\left(x_i;\theta\right)\right)^2\]

The value \(F(\theta)\) is the sum of the squared residuals, SSE, at the point θ.

Convergence is declared if \(|g_i| \max\{|\theta_i|,1.0/t_i \}/SSE\) is less than gradientEps for i= 0, 1, 2, …, nParameters, where \(g_i\) is the i‑th component of an internal intermediate gradient vector.

Default:

\[\mathtt{gradient\_eps} = \sqrt{\varepsilon}\]

( \(\sqrt[3]{\varepsilon}\) in double, where ɛ is the machine precision)

stepEps, float (Input)

Scaled step tolerance. The j-th component of the scaled step from points θ and θʹ is computed as

\[\frac{\left|\theta_j - \theta'_j\right|}{\max\left(\left|\theta_j\right|, 1/t_j\right)}\]

where t = thetaScale

Convergence is declared if \(|g_{n+i}| \max\{|\theta_i|,1.0/t_i \}\) is less than stepEps for \(i=0,1,2,\ldots,n\), where \(g_{n+i}\) is the i‑th component of the last step and n = nParameters.

Default: stepEps = \(\varepsilon^{2/3}\),where ɛ is the machine precision

sseRelEps, float (Input)

Relative SSE function tolerance.

Convergence is declared if the change in SSE is less than or equal to sseRelEps SSE in absolute value.

Default: sseRelEps = \(\max(10^{-10}\), ɛ^{2/3})`, \(\max (10^{-20},\varepsilon^{2/3})\) in double, where ɛ is the machine precision

sseAbsEps, float (Input)

Absolute SSE function tolerance.

Convergence is declared if SSE is less than sseAbsEps.

Default: sseAbsEps = \(\max (10^{-20},\varepsilon^2)\), where ɛ is the machine precision

maxStep, float (Input)

Maximum allowable step size.

Default: maxStep = \(1000 \max(\varepsilon_1,\varepsilon_2)\), where \(\varepsilon_1= (t^T \theta_0)^{1/2}\), \(\varepsilon_2=\|t\|_2\), t = thetaScale, and \(\theta_0\) = thetaGuess

initialTrustRegion, float (Input)
Size of initial trust region radius. The default is based on the initial scaled Cauchy step.
goodDigit, int (Input)

Number of good digits in the function.

Default: machine dependent

maxIterations, int (Input)

Maximum number of iterations.

Default: maxIterations = 100

maxSseEvaluations, int (Input)

Maximum number of SSE function evaluations.

Default: maxSseEvaluations = 400

maxJacobianEvaluations, int (Input)

Maximum number of Jacobian evaluations.

Default: maxJacobianEvaluations = 400

tolerance, float (Input)

False convergence tolerance.

Default: tolerance = 100* eps, where eps = machine(4)

predicted (Output)
A real array of length nObservations containing the predicted values at the approximate solution.
residual (Output)
A real array of length nObservations containing the residuals at the approximate solution.
r (Output)
An array of size nParameters × nParameters containing the R matrix from a QR decomposition of the Jacobian.
rRank (Output)
Rank of r. Argument rank less than nParameters may indicate the model is overparameterized.
df (Output)
Degrees of freedom.
sse (Output)
Residual sum of squares.

Description

Function nonlinearRegression fits a nonlinear regression model using least squares. The nonlinear regression model is

\[y_i = f\left(x_i;\theta\right) + \varepsilon_i \phantom{...} i=1,2, \ldots, n\]

where the observed values of the \(y_i\)’s constitute the responses or values of the dependent variable, the known \(x_i\)’s are the vectors of the values of the independent (explanatory) variables, θ is the vector of p regression parameters, and the \(\varepsilon_i\)’s are independently distributed normal errors with mean 0 and variance \(\sigma^2\). For this model, a least-squares estimate of θ is also a maximum likelihood estimate of θ.

The residuals for the model are as follows:

\[e_i(\theta) = y_i - f\left(x_i;\theta\right) \phantom{...} i=1,2, \ldots, n\]

A value of θ that minimizes

\[\sum\nolimits_{i=1}^{n} \left[e_i(\theta)\right]^2\]

is a least-squares estimate of θ. Function nonlinearRegression is designed so that the values of the function \(f(x_i; \theta)\) are computed one at a time by a user-supplied function.

Function nonlinearRegression is based on MINPACK routines LMDIF and LMDER by Moré et al. (1980) that use a modified Levenberg-Marquardt method to generate a sequence of approximations to a minimum point. Let

\[\hat{\theta}_c\]

be the current estimate of θ. A new estimate is given by

\[\hat{\theta}_c + s_c\]

where \(s_c\) is a solution to the following:

\[\left(J\left(\hat{\theta}_c\right)^T J\left(\hat{\theta}_c\right) + \mu_c I\right)s_c = J\left(\hat{\theta}_c\right)^T e\left(\hat{\theta}_c\right)\]

Here

\[J\left(\hat{\theta}_c\right)\]

is the Jacobian evaluated at

\[\hat{\theta}_c\]

The algorithm uses a “trust region” approach with a step bound of \(\delta_c\). A solution of the equations is first obtained for

\[μ_c = 0. If ∥s_c∥_2 < δ_c\]

this update is accepted; otherwise, \(\mu_c\) is set to a positive value and another solution is obtained. The method is discussed by Levenberg (1944), Marquardt (1963), and Dennis and Schnabel (1983, pp. 129−147, 218−338).

If a user-supplied function is specified in jacobian, the Jacobian is computed analytically; otherwise, forward finite differences are used to estimate the Jacobian numerically. In the latter case, the estimate of the Jacobian may be so poor that the algorithm terminates at a noncritical point. In such instances, the user should either supply a Jacobian function.

The first stopping criterion for nonlinearRegression occurs when SSE is less than the absolute function tolerance. The second stopping criterion occurs when the norm of the scaled gradient is less than the given gradient tolerance. The third stopping criterion occurs when the scaled distance between the last two steps is less than the step tolerance. The third stopping criterion also generates error IMSLS_LITTLE_FCN_CHANGE. The fourth stopping criterion occurs when the relative change in SSE is less than sseRelEps. The fourth stopping criterion also generates error code IMSLS_FALSE_CONVERGENCE. See Dennis and Schnabel (1983, pages 159−161, 278−280, and 347−348) for a discussion of stopping criteria and choice of tolerances.

On some platforms, nonlinearRegression can evaluate the user-supplied functions fcn and jacobian in parallel. This is done only if the function ompOptions is called to flag user-defined functions as thread-safe. A function is thread-safe if there are no dependencies between calls. Such dependencies are usually the result of writing to global or static variables.

Programming Notes

Nonlinear regression allows substantial flexibility over linear regression because the user can specify the functional form of the model. This added flexibility can cause unexpected convergence problems for users that are unaware of the limitations of the software. Also, in many cases, there are possible remedies that may not be immediately obvious. The following is a list of possible convergence problems and some remedies. There is not a one-to-one correspondence between the problems and the remedies. Remedies for some problems also may be relevant for the other problems.

  1. A local minimum is found. Try a different starting value. Good starting values often can be obtained by fitting simpler models. For example, for a nonlinear function.

    \[f(x;\theta) = \theta_1 e^{\theta_2 x}\]

    good starting values can be obtained from the estimated linear regression coefficients

    \[\hat{\beta}_0\]

    and

    \[\hat{\beta}_1\]

    from a simple linear regression of ln y on ln x. The starting values for the nonlinear regression in this case would be

    \[\theta_1 = e^{\hat{\beta}_0} \text{ and } \theta_2 = \hat{\beta}_1\]

    If an approximate linear model is not clear, then simplify the model by reducing the number of nonlinear regression parameters. For example, some nonlinear parameters for which good starting values are known could be set to these values in order to simplify the model for computing starting values for the remaining parameters.

  2. The estimate of θ is incorrectly returned as the same or very close to the initial estimate. This occurs often because of poor scaling of the problem, which might result in the residual sum of squares being either very large or very small relative to the precision of the computer. The optional arguments allow control of the scaling.

  3. The model is discontinuous as a function of θ. (The function f(x;θ) can be a discontinuous function of x.)

  4. Overflow occurs during the computations. Make sure the user-supplied functions do not overflow at some value of θ.

  5. The estimate of θ is going to infinity. A parameterization of the problem in terms of reciprocals may help.

  6. Some components of θ are outside known bounds. This can sometimes be handled by making a function that produces artificially large residuals outside of the bounds (even though this introduces a discontinuity in the model function).

Examples

Example 1

In this example (Draper and Smith 1981, p. 518), the following nonlinear model is fit:

\[Y = \alpha + (-0.49 - \alpha) e^{-\beta(X-8)} + \varepsilon\]
from numpy import *
from pyimsl.stat.nonlinearRegression import nonlinearRegression
from pyimsl.stat.writeMatrix import writeMatrix


# User function
def fcn(n_independent, x, n_parameters, theta):
    return (theta[0] + (0.49 - theta[0]) * exp(theta[1] * (x[0] - 8)))


n_independent = 1
n_parameters = 2
x = [10.0, 20.0, 30.0, 40.0]
y = [0.48, 0.42, 0.40, 0.39]

# Nonlinear regression
theta_hat = nonlinearRegression(fcn, n_parameters, x, y)

# Print estimates
writeMatrix("estimated coefficients", theta_hat)

Output

 
 estimated coefficients
          1            2
     0.3807      -0.0795

Example 2

Consider the nonlinear regression model and data set discussed by Neter et al. (1983, pp. 475−478):

\[y_i = \theta_1 e^{\theta_2 x_i} + \varepsilon_i\]

There are two parameters and one independent variable. The data set considered consists of 15 observations.

from __future__ import print_function
from numpy import *
from pyimsl.stat.nonlinearRegression import nonlinearRegression
from pyimsl.stat.writeMatrix import writeMatrix


def fcn(n_independent, x, n_parameters, theta):
    return (theta[0] * exp(x[0] * theta[1]))


def jacobian(n_independent, x, n_parameters, theta, fjac):
    fjac[0] = exp(theta[1] * x[0])
    fjac[1] = theta[0] * x[0] * exp(theta[1] * x[0])


n_independent = 1
n_parameters = 2
grad_eps = 1.0e-3
theta_guess = [60.0, -0.03]
y = array([
    54.0, 50.0, 45.0, 37.0, 35.0,
    25.0, 20.0, 16.0, 18.0, 13.0,
    8.0, 11.0, 8.0, 4.0, 6.0])
x = array([
    2.0, 5.0, 7.0, 10.0, 14.0,
    19.0, 26.0, 31.0, 34.0, 38.0,
    45.0, 52.0, 53.0, 60.0, 65.0])
r = []
y_hat = []

# Nonlinear regression
theta_hat = nonlinearRegression(fcn, n_parameters, x, y,
                                thetaGuess=theta_guess,
                                gradientEps=grad_eps,
                                r=r,
                                predicted=y_hat,
                                jacobian=jacobian)

# Print results
writeMatrix("Estimated coefficients", theta_hat)
writeMatrix("Predicted values", y_hat)
writeMatrix("R matrix", r, writeFormat="%10.2f")

Output

 
 Estimated coefficients
          1            2
      58.61        -0.04
 
                              Predicted values
          1            2            3            4            5            6
      54.15        48.08        44.42        39.45        33.67        27.62
 
          7            8            9           10           11           12
      20.94        17.18        15.26        13.02         9.87         7.48
 
         13           14           15
       7.19         5.45         4.47
 
        R matrix
            1           2
1        1.87     1139.93
2        0.00     1139.80

Informational Errors

IMSLS_STEP_TOLERANCE Scaled step tolerance satisfied. The current point may be an approximate local solution, but it is also possible that the algorithm is making very slow progress and is not near a solution or that “stepEps” is too big.

Warning Errors

IMSLS_LITTLE_FCN_CHANGE Both the actual and predicted relative reductions in the function are less than or equal to the relative function tolerance.
IMSLS_TOO_MANY_ITN Maximum number of iterations exceeded.
IMSLS_TOO_MANY_JACOBIAN_EVAL Maximum number of Jacobian evaluations exceeded.
IMSLS_UNBOUNDED Five consecutive steps have been taken with the maximum step length.
IMSLS_FALSE_CONVERGENCE The iterates appear to be converging to a noncritical point.

Fatal Errors

IMSLS_TOO_MANY_FCN_EVAL Maximum number of function evaluations exceeded.
IMSLS_STOP_USER_FCN

Request from user supplied function to stop algorithm.

User flag = “#”.