nonlinearRegression¶
Fits a multivariate nonlinear regression model.
Synopsis¶
nonlinearRegression (fcn, nParameters, x, y)
Required Arguments¶
- float
fcn(nIndependent,xi,nParameters,theta) - User-supplied function to evaluate the function that defines the nonlinear
regression problem where
xiis an array of lengthnIndependentat which point the function is evaluated andthetais an array of lengthnParameterscontaining the current values of the regression coefficients. Functionfcnreturns a predicted value at the pointxi. In the following, \(f(x_i;\theta)\), or just \(f_i\), denotes the value of this function at the point \(x_i\), for a given value of θ. (Both \(x_i\) and θ are arrays.) - int
nParameters(Input) - Number of parameters to be estimated.
- float
x[[]](Input) - Array of size
nObservationsbynIndependentcontaining the matrix of independent (explanatory) variables. - float
y[](Input) - Array of length
nObservationscontaining the dependent (response) variable.
Return Value¶
An array of length nParameters containing a solution,
\(\hat{\theta}\) for the nonlinear regression coefficients. If no
solution can be computed, then None is returned.
Optional Arguments¶
thetaGuess, float[](Input)Array with
nParameterscomponents containing an initial guess.Default:
thetaGuess[]= 0.jacobian, (nIndependent,xi,nParameters,theta,fjac) (Input/Output)- User-supplied function to compute the i‑th row of the Jacobian, where
the
nIndependentdata values corresponding to the i‑th row are input inxi. Argumentthetais an array of lengthnParameterscontaining the regression coefficients for which the Jacobian is evaluated,fjacis the computednParametersrow of the Jacobian for observation i attheta. Note that each derivative \(\partial f(x_i)/\partial\theta _j\) should be returned infjac[j− 1] for \(j=1,2,\ldots\),nParameters. thetaScale, float[](Input)Array with
nParameterscomponents containing the scaling array for θ. ArraythetaScaleis used mainly in scaling the gradient and the distance between two points. See keywordsgradientEpsandstepEpsfor more details.Default:
thetaScale[]= 1.gradientEps, float (Input)Scaled gradient tolerance. The j-th component of the scaled gradient at θ is calculated as
\[\frac{\left|g_j\right| * \max\left(\left|\theta_j\right|, 1 / t_j\right)} {\tfrac{1}{2} \|F(\theta)\|_2^2}\]where \(g=\nabla F(\theta)\), t =
thetaScale, and\[\|F(\theta)\|_2^2 = \sum\nolimits_{i=1}^{n} \left(y_i - f\left(x_i;\theta\right)\right)^2\]The value \(F(\theta)\) is the sum of the squared residuals, SSE, at the point θ.
Convergence is declared if \(|g_i| \max\{|\theta_i|,1.0/t_i \}/SSE\) is less than
gradientEpsfor i= 0, 1, 2, …,nParameters, where \(g_i\) is the i‑th component of an internal intermediate gradient vector.Default:
\[\mathtt{gradient\_eps} = \sqrt{\varepsilon}\]( \(\sqrt[3]{\varepsilon}\) in double, where ɛ is the machine precision)
stepEps, float (Input)Scaled step tolerance. The j-th component of the scaled step from points θ and θʹ is computed as
\[\frac{\left|\theta_j - \theta'_j\right|}{\max\left(\left|\theta_j\right|, 1/t_j\right)}\]where t =
thetaScaleConvergence is declared if \(|g_{n+i}| \max\{|\theta_i|,1.0/t_i \}\) is less than
stepEpsfor \(i=0,1,2,\ldots,n\), where \(g_{n+i}\) is the i‑th component of the last step and n =nParameters.Default:
stepEps= \(\varepsilon^{2/3}\),where ɛ is the machine precisionsseRelEps, float (Input)Relative SSE function tolerance.
Convergence is declared if the change in SSE is less than or equal to
sseRelEpsSSE in absolute value.Default:
sseRelEps= \(\max(10^{-10}\), ɛ^{2/3})`, \(\max (10^{-20},\varepsilon^{2/3})\) in double, where ɛ is the machine precisionsseAbsEps, float (Input)Absolute SSE function tolerance.
Convergence is declared if SSE is less than
sseAbsEps.Default:
sseAbsEps= \(\max (10^{-20},\varepsilon^2)\), where ɛ is the machine precisionmaxStep, float (Input)Maximum allowable step size.
Default:
maxStep= \(1000 \max(\varepsilon_1,\varepsilon_2)\), where \(\varepsilon_1= (t^T \theta_0)^{1/2}\), \(\varepsilon_2=\|t\|_2\), t =thetaScale, and \(\theta_0\) =thetaGuessinitialTrustRegion, float (Input)- Size of initial trust region radius. The default is based on the initial scaled Cauchy step.
goodDigit, int (Input)Number of good digits in the function.
Default: machine dependent
maxIterations, int (Input)Maximum number of iterations.
Default:
maxIterations= 100maxSseEvaluations, int (Input)Maximum number of SSE function evaluations.
Default:
maxSseEvaluations= 400maxJacobianEvaluations, int (Input)Maximum number of Jacobian evaluations.
Default:
maxJacobianEvaluations= 400tolerance, float (Input)False convergence tolerance.
Default:
tolerance= 100* eps, where eps =machine(4)predicted(Output)- A real array of length
nObservationscontaining the predicted values at the approximate solution. residual(Output)- A real array of length
nObservationscontaining the residuals at the approximate solution. r(Output)- An array of size
nParameters×nParameterscontaining the R matrix from a QR decomposition of the Jacobian. rRank(Output)- Rank of
r. Argumentrankless thannParametersmay indicate the model is overparameterized. df(Output)- Degrees of freedom.
sse(Output)- Residual sum of squares.
Description¶
Function nonlinearRegression fits a nonlinear regression model using
least squares. The nonlinear regression model is
where the observed values of the \(y_i\)’s constitute the responses or values of the dependent variable, the known \(x_i\)’s are the vectors of the values of the independent (explanatory) variables, θ is the vector of p regression parameters, and the \(\varepsilon_i\)’s are independently distributed normal errors with mean 0 and variance \(\sigma^2\). For this model, a least-squares estimate of θ is also a maximum likelihood estimate of θ.
The residuals for the model are as follows:
A value of θ that minimizes
is a least-squares estimate of θ. Function nonlinearRegression is
designed so that the values of the function \(f(x_i; \theta)\) are
computed one at a time by a user-supplied function.
Function nonlinearRegression is based on MINPACK routines LMDIF and
LMDER by Moré et al. (1980) that use a modified Levenberg-Marquardt method
to generate a sequence of approximations to a minimum point. Let
be the current estimate of θ. A new estimate is given by
where \(s_c\) is a solution to the following:
Here
is the Jacobian evaluated at
The algorithm uses a “trust region” approach with a step bound of \(\delta_c\). A solution of the equations is first obtained for
this update is accepted; otherwise, \(\mu_c\) is set to a positive value and another solution is obtained. The method is discussed by Levenberg (1944), Marquardt (1963), and Dennis and Schnabel (1983, pp. 129−147, 218−338).
If a user-supplied function is specified in jacobian, the Jacobian is
computed analytically; otherwise, forward finite differences are used to
estimate the Jacobian numerically. In the latter case, the estimate of the
Jacobian may be so poor that the algorithm terminates at a noncritical
point. In such instances, the user should either supply a Jacobian function.
The first stopping criterion for nonlinearRegression occurs when SSE is
less than the absolute function tolerance. The second stopping criterion
occurs when the norm of the scaled gradient is less than the given gradient
tolerance. The third stopping criterion occurs when the scaled distance
between the last two steps is less than the step tolerance. The third
stopping criterion also generates error IMSLS_LITTLE_FCN_CHANGE. The
fourth stopping criterion occurs when the relative change in SSE is less
than sseRelEps. The fourth stopping criterion also generates error code
IMSLS_FALSE_CONVERGENCE. See Dennis and Schnabel (1983, pages 159−161,
278−280, and 347−348) for a discussion of stopping criteria and choice of
tolerances.
On some platforms, nonlinearRegression can evaluate the user-supplied
functions fcn and jacobian in parallel. This is done only if the
function ompOptions is called to flag user-defined
functions as thread-safe. A function is thread-safe if there are no
dependencies between calls. Such dependencies are usually the result of
writing to global or static variables.
Programming Notes¶
Nonlinear regression allows substantial flexibility over linear regression because the user can specify the functional form of the model. This added flexibility can cause unexpected convergence problems for users that are unaware of the limitations of the software. Also, in many cases, there are possible remedies that may not be immediately obvious. The following is a list of possible convergence problems and some remedies. There is not a one-to-one correspondence between the problems and the remedies. Remedies for some problems also may be relevant for the other problems.
A local minimum is found. Try a different starting value. Good starting values often can be obtained by fitting simpler models. For example, for a nonlinear function.
\[f(x;\theta) = \theta_1 e^{\theta_2 x}\]good starting values can be obtained from the estimated linear regression coefficients
\[\hat{\beta}_0\]and
\[\hat{\beta}_1\]from a simple linear regression of ln y on ln x. The starting values for the nonlinear regression in this case would be
\[\theta_1 = e^{\hat{\beta}_0} \text{ and } \theta_2 = \hat{\beta}_1\]If an approximate linear model is not clear, then simplify the model by reducing the number of nonlinear regression parameters. For example, some nonlinear parameters for which good starting values are known could be set to these values in order to simplify the model for computing starting values for the remaining parameters.
The estimate of θ is incorrectly returned as the same or very close to the initial estimate. This occurs often because of poor scaling of the problem, which might result in the residual sum of squares being either very large or very small relative to the precision of the computer. The optional arguments allow control of the scaling.
The model is discontinuous as a function of θ. (The function f(x;θ) can be a discontinuous function of x.)
Overflow occurs during the computations. Make sure the user-supplied functions do not overflow at some value of θ.
The estimate of θ is going to infinity. A parameterization of the problem in terms of reciprocals may help.
Some components of θ are outside known bounds. This can sometimes be handled by making a function that produces artificially large residuals outside of the bounds (even though this introduces a discontinuity in the model function).
Examples¶
Example 1¶
In this example (Draper and Smith 1981, p. 518), the following nonlinear model is fit:
from numpy import *
from pyimsl.stat.nonlinearRegression import nonlinearRegression
from pyimsl.stat.writeMatrix import writeMatrix
# User function
def fcn(n_independent, x, n_parameters, theta):
return (theta[0] + (0.49 - theta[0]) * exp(theta[1] * (x[0] - 8)))
n_independent = 1
n_parameters = 2
x = [10.0, 20.0, 30.0, 40.0]
y = [0.48, 0.42, 0.40, 0.39]
# Nonlinear regression
theta_hat = nonlinearRegression(fcn, n_parameters, x, y)
# Print estimates
writeMatrix("estimated coefficients", theta_hat)
Output¶
estimated coefficients
1 2
0.3807 -0.0795
Example 2¶
Consider the nonlinear regression model and data set discussed by Neter et al. (1983, pp. 475−478):
There are two parameters and one independent variable. The data set considered consists of 15 observations.
from __future__ import print_function
from numpy import *
from pyimsl.stat.nonlinearRegression import nonlinearRegression
from pyimsl.stat.writeMatrix import writeMatrix
def fcn(n_independent, x, n_parameters, theta):
return (theta[0] * exp(x[0] * theta[1]))
def jacobian(n_independent, x, n_parameters, theta, fjac):
fjac[0] = exp(theta[1] * x[0])
fjac[1] = theta[0] * x[0] * exp(theta[1] * x[0])
n_independent = 1
n_parameters = 2
grad_eps = 1.0e-3
theta_guess = [60.0, -0.03]
y = array([
54.0, 50.0, 45.0, 37.0, 35.0,
25.0, 20.0, 16.0, 18.0, 13.0,
8.0, 11.0, 8.0, 4.0, 6.0])
x = array([
2.0, 5.0, 7.0, 10.0, 14.0,
19.0, 26.0, 31.0, 34.0, 38.0,
45.0, 52.0, 53.0, 60.0, 65.0])
r = []
y_hat = []
# Nonlinear regression
theta_hat = nonlinearRegression(fcn, n_parameters, x, y,
thetaGuess=theta_guess,
gradientEps=grad_eps,
r=r,
predicted=y_hat,
jacobian=jacobian)
# Print results
writeMatrix("Estimated coefficients", theta_hat)
writeMatrix("Predicted values", y_hat)
writeMatrix("R matrix", r, writeFormat="%10.2f")
Output¶
Estimated coefficients
1 2
58.61 -0.04
Predicted values
1 2 3 4 5 6
54.15 48.08 44.42 39.45 33.67 27.62
7 8 9 10 11 12
20.94 17.18 15.26 13.02 9.87 7.48
13 14 15
7.19 5.45 4.47
R matrix
1 2
1 1.87 1139.93
2 0.00 1139.80
Informational Errors¶
IMSLS_STEP_TOLERANCE |
Scaled step tolerance satisfied. The
current point may be an approximate local
solution, but it is also possible that the
algorithm is making very slow progress and
is not near a solution or that
“stepEps” is too big. |
Warning Errors¶
IMSLS_LITTLE_FCN_CHANGE |
Both the actual and predicted relative reductions in the function are less than or equal to the relative function tolerance. |
IMSLS_TOO_MANY_ITN |
Maximum number of iterations exceeded. |
IMSLS_TOO_MANY_JACOBIAN_EVAL |
Maximum number of Jacobian evaluations exceeded. |
IMSLS_UNBOUNDED |
Five consecutive steps have been taken with the maximum step length. |
IMSLS_FALSE_CONVERGENCE |
The iterates appear to be converging to a noncritical point. |
Fatal Errors¶
IMSLS_TOO_MANY_FCN_EVAL |
Maximum number of function evaluations exceeded. |
IMSLS_STOP_USER_FCN |
Request from user supplied function to stop algorithm. User flag = “#”. |