nonlinearRegression¶
Fits a multivariate nonlinear regression model.
Synopsis¶
nonlinearRegression (fcn, nParameters, x, y)
Required Arguments¶
- float
fcn
(nIndependent
,xi
,nParameters
,theta
) - User-supplied function to evaluate the function that defines the nonlinear
regression problem where
xi
is an array of lengthnIndependent
at which point the function is evaluated andtheta
is an array of lengthnParameters
containing the current values of the regression coefficients. Functionfcn
returns a predicted value at the pointxi
. In the following, \(f(x_i;\theta)\), or just \(f_i\), denotes the value of this function at the point \(x_i\), for a given value of θ. (Both \(x_i\) and θ are arrays.) - int
nParameters
(Input) - Number of parameters to be estimated.
- float
x[[]]
(Input) - Array of size
nObservations
bynIndependent
containing the matrix of independent (explanatory) variables. - float
y[]
(Input) - Array of length
nObservations
containing the dependent (response) variable.
Return Value¶
An array of length nParameters
containing a solution,
\(\hat{\theta}\) for the nonlinear regression coefficients. If no
solution can be computed, then None
is returned.
Optional Arguments¶
thetaGuess
, float[]
(Input)Array with
nParameters
components containing an initial guess.Default:
thetaGuess[]
= 0.jacobian
, (nIndependent
,xi
,nParameters
,theta
,fjac
) (Input/Output)- User-supplied function to compute the i‑th row of the Jacobian, where
the
nIndependent
data values corresponding to the i‑th row are input inxi
. Argumenttheta
is an array of lengthnParameters
containing the regression coefficients for which the Jacobian is evaluated,fjac
is the computednParameters
row of the Jacobian for observation i attheta
. Note that each derivative \(\partial f(x_i)/\partial\theta _j\) should be returned infjac
[j
− 1] for \(j=1,2,\ldots\),nParameters
. thetaScale
, float[]
(Input)Array with
nParameters
components containing the scaling array for θ. ArraythetaScale
is used mainly in scaling the gradient and the distance between two points. See keywordsgradientEps
andstepEps
for more details.Default:
thetaScale[]
= 1.gradientEps
, float (Input)Scaled gradient tolerance. The j-th component of the scaled gradient at θ is calculated as
\[\frac{\left|g_j\right| * \max\left(\left|\theta_j\right|, 1 / t_j\right)} {\tfrac{1}{2} \|F(\theta)\|_2^2}\]where \(g=\nabla F(\theta)\), t =
thetaScale
, and\[\|F(\theta)\|_2^2 = \sum\nolimits_{i=1}^{n} \left(y_i - f\left(x_i;\theta\right)\right)^2\]The value \(F(\theta)\) is the sum of the squared residuals, SSE, at the point θ.
Convergence is declared if \(|g_i| \max\{|\theta_i|,1.0/t_i \}/SSE\) is less than
gradientEps
for i= 0, 1, 2, …,nParameters
, where \(g_i\) is the i‑th component of an internal intermediate gradient vector.Default:
\[\mathtt{gradient\_eps} = \sqrt{\varepsilon}\]( \(\sqrt[3]{\varepsilon}\) in double, where ɛ is the machine precision)
stepEps
, float (Input)Scaled step tolerance. The j-th component of the scaled step from points θ and θʹ is computed as
\[\frac{\left|\theta_j - \theta'_j\right|}{\max\left(\left|\theta_j\right|, 1/t_j\right)}\]where t =
thetaScale
Convergence is declared if \(|g_{n+i}| \max\{|\theta_i|,1.0/t_i \}\) is less than
stepEps
for \(i=0,1,2,\ldots,n\), where \(g_{n+i}\) is the i‑th component of the last step and n =nParameters
.Default:
stepEps
= \(\varepsilon^{2/3}\),where ɛ is the machine precisionsseRelEps
, float (Input)Relative SSE function tolerance.
Convergence is declared if the change in SSE is less than or equal to
sseRelEps
SSE in absolute value.Default:
sseRelEps
= \(\max(10^{-10}\), ɛ^{2/3})`, \(\max (10^{-20},\varepsilon^{2/3})\) in double, where ɛ is the machine precisionsseAbsEps
, float (Input)Absolute SSE function tolerance.
Convergence is declared if SSE is less than
sseAbsEps
.Default:
sseAbsEps
= \(\max (10^{-20},\varepsilon^2)\), where ɛ is the machine precisionmaxStep
, float (Input)Maximum allowable step size.
Default:
maxStep
= \(1000 \max(\varepsilon_1,\varepsilon_2)\), where \(\varepsilon_1= (t^T \theta_0)^{1/2}\), \(\varepsilon_2=\|t\|_2\), t =thetaScale
, and \(\theta_0\) =thetaGuess
initialTrustRegion
, float (Input)- Size of initial trust region radius. The default is based on the initial scaled Cauchy step.
goodDigit
, int (Input)Number of good digits in the function.
Default: machine dependent
maxIterations
, int (Input)Maximum number of iterations.
Default:
maxIterations
= 100maxSseEvaluations
, int (Input)Maximum number of SSE function evaluations.
Default:
maxSseEvaluations
= 400maxJacobianEvaluations
, int (Input)Maximum number of Jacobian evaluations.
Default:
maxJacobianEvaluations
= 400tolerance
, float (Input)False convergence tolerance.
Default:
tolerance
= 100* eps, where eps =machine
(4)predicted
(Output)- A real array of length
nObservations
containing the predicted values at the approximate solution. residual
(Output)- A real array of length
nObservations
containing the residuals at the approximate solution. r
(Output)- An array of size
nParameters
×nParameters
containing the R matrix from a QR decomposition of the Jacobian. rRank
(Output)- Rank of
r
. Argumentrank
less thannParameters
may indicate the model is overparameterized. df
(Output)- Degrees of freedom.
sse
(Output)- Residual sum of squares.
Description¶
Function nonlinearRegression
fits a nonlinear regression model using
least squares. The nonlinear regression model is
where the observed values of the \(y_i\)’s constitute the responses or values of the dependent variable, the known \(x_i\)’s are the vectors of the values of the independent (explanatory) variables, θ is the vector of p regression parameters, and the \(\varepsilon_i\)’s are independently distributed normal errors with mean 0 and variance \(\sigma^2\). For this model, a least-squares estimate of θ is also a maximum likelihood estimate of θ.
The residuals for the model are as follows:
A value of θ that minimizes
is a least-squares estimate of θ. Function nonlinearRegression
is
designed so that the values of the function \(f(x_i; \theta)\) are
computed one at a time by a user-supplied function.
Function nonlinearRegression
is based on MINPACK routines LMDIF and
LMDER by Moré et al. (1980) that use a modified Levenberg-Marquardt method
to generate a sequence of approximations to a minimum point. Let
be the current estimate of θ. A new estimate is given by
where \(s_c\) is a solution to the following:
Here
is the Jacobian evaluated at
The algorithm uses a “trust region” approach with a step bound of \(\delta_c\). A solution of the equations is first obtained for
this update is accepted; otherwise, \(\mu_c\) is set to a positive value and another solution is obtained. The method is discussed by Levenberg (1944), Marquardt (1963), and Dennis and Schnabel (1983, pp. 129−147, 218−338).
If a user-supplied function is specified in jacobian
, the Jacobian is
computed analytically; otherwise, forward finite differences are used to
estimate the Jacobian numerically. In the latter case, the estimate of the
Jacobian may be so poor that the algorithm terminates at a noncritical
point. In such instances, the user should either supply a Jacobian function.
The first stopping criterion for nonlinearRegression
occurs when SSE is
less than the absolute function tolerance. The second stopping criterion
occurs when the norm of the scaled gradient is less than the given gradient
tolerance. The third stopping criterion occurs when the scaled distance
between the last two steps is less than the step tolerance. The third
stopping criterion also generates error IMSLS_LITTLE_FCN_CHANGE
. The
fourth stopping criterion occurs when the relative change in SSE is less
than sseRelEps
. The fourth stopping criterion also generates error code
IMSLS_FALSE_CONVERGENCE
. See Dennis and Schnabel (1983, pages 159−161,
278−280, and 347−348) for a discussion of stopping criteria and choice of
tolerances.
On some platforms, nonlinearRegression
can evaluate the user-supplied
functions fcn
and jacobian
in parallel. This is done only if the
function ompOptions is called to flag user-defined
functions as thread-safe. A function is thread-safe if there are no
dependencies between calls. Such dependencies are usually the result of
writing to global or static variables.
Programming Notes¶
Nonlinear regression allows substantial flexibility over linear regression because the user can specify the functional form of the model. This added flexibility can cause unexpected convergence problems for users that are unaware of the limitations of the software. Also, in many cases, there are possible remedies that may not be immediately obvious. The following is a list of possible convergence problems and some remedies. There is not a one-to-one correspondence between the problems and the remedies. Remedies for some problems also may be relevant for the other problems.
A local minimum is found. Try a different starting value. Good starting values often can be obtained by fitting simpler models. For example, for a nonlinear function.
\[f(x;\theta) = \theta_1 e^{\theta_2 x}\]good starting values can be obtained from the estimated linear regression coefficients
\[\hat{\beta}_0\]and
\[\hat{\beta}_1\]from a simple linear regression of ln y on ln x. The starting values for the nonlinear regression in this case would be
\[\theta_1 = e^{\hat{\beta}_0} \text{ and } \theta_2 = \hat{\beta}_1\]If an approximate linear model is not clear, then simplify the model by reducing the number of nonlinear regression parameters. For example, some nonlinear parameters for which good starting values are known could be set to these values in order to simplify the model for computing starting values for the remaining parameters.
The estimate of θ is incorrectly returned as the same or very close to the initial estimate. This occurs often because of poor scaling of the problem, which might result in the residual sum of squares being either very large or very small relative to the precision of the computer. The optional arguments allow control of the scaling.
The model is discontinuous as a function of θ. (The function f(x;θ) can be a discontinuous function of x.)
Overflow occurs during the computations. Make sure the user-supplied functions do not overflow at some value of θ.
The estimate of θ is going to infinity. A parameterization of the problem in terms of reciprocals may help.
Some components of θ are outside known bounds. This can sometimes be handled by making a function that produces artificially large residuals outside of the bounds (even though this introduces a discontinuity in the model function).
Examples¶
Example 1¶
In this example (Draper and Smith 1981, p. 518), the following nonlinear model is fit:
from numpy import *
from pyimsl.stat.nonlinearRegression import nonlinearRegression
from pyimsl.stat.writeMatrix import writeMatrix
# User function
def fcn(n_independent, x, n_parameters, theta):
return (theta[0] + (0.49 - theta[0]) * exp(theta[1] * (x[0] - 8)))
n_independent = 1
n_parameters = 2
x = [10.0, 20.0, 30.0, 40.0]
y = [0.48, 0.42, 0.40, 0.39]
# Nonlinear regression
theta_hat = nonlinearRegression(fcn, n_parameters, x, y)
# Print estimates
writeMatrix("estimated coefficients", theta_hat)
Output¶
estimated coefficients
1 2
0.3807 -0.0795
Example 2¶
Consider the nonlinear regression model and data set discussed by Neter et al. (1983, pp. 475−478):
There are two parameters and one independent variable. The data set considered consists of 15 observations.
from __future__ import print_function
from numpy import *
from pyimsl.stat.nonlinearRegression import nonlinearRegression
from pyimsl.stat.writeMatrix import writeMatrix
def fcn(n_independent, x, n_parameters, theta):
return (theta[0] * exp(x[0] * theta[1]))
def jacobian(n_independent, x, n_parameters, theta, fjac):
fjac[0] = exp(theta[1] * x[0])
fjac[1] = theta[0] * x[0] * exp(theta[1] * x[0])
n_independent = 1
n_parameters = 2
grad_eps = 1.0e-3
theta_guess = [60.0, -0.03]
y = array([
54.0, 50.0, 45.0, 37.0, 35.0,
25.0, 20.0, 16.0, 18.0, 13.0,
8.0, 11.0, 8.0, 4.0, 6.0])
x = array([
2.0, 5.0, 7.0, 10.0, 14.0,
19.0, 26.0, 31.0, 34.0, 38.0,
45.0, 52.0, 53.0, 60.0, 65.0])
r = []
y_hat = []
# Nonlinear regression
theta_hat = nonlinearRegression(fcn, n_parameters, x, y,
thetaGuess=theta_guess,
gradientEps=grad_eps,
r=r,
predicted=y_hat,
jacobian=jacobian)
# Print results
writeMatrix("Estimated coefficients", theta_hat)
writeMatrix("Predicted values", y_hat)
writeMatrix("R matrix", r, writeFormat="%10.2f")
Output¶
Estimated coefficients
1 2
58.61 -0.04
Predicted values
1 2 3 4 5 6
54.15 48.08 44.42 39.45 33.67 27.62
7 8 9 10 11 12
20.94 17.18 15.26 13.02 9.87 7.48
13 14 15
7.19 5.45 4.47
R matrix
1 2
1 1.87 1139.93
2 0.00 1139.80
Informational Errors¶
IMSLS_STEP_TOLERANCE |
Scaled step tolerance satisfied. The
current point may be an approximate local
solution, but it is also possible that the
algorithm is making very slow progress and
is not near a solution or that
“stepEps ” is too big. |
Warning Errors¶
IMSLS_LITTLE_FCN_CHANGE |
Both the actual and predicted relative reductions in the function are less than or equal to the relative function tolerance. |
IMSLS_TOO_MANY_ITN |
Maximum number of iterations exceeded. |
IMSLS_TOO_MANY_JACOBIAN_EVAL |
Maximum number of Jacobian evaluations exceeded. |
IMSLS_UNBOUNDED |
Five consecutive steps have been taken with the maximum step length. |
IMSLS_FALSE_CONVERGENCE |
The iterates appear to be converging to a noncritical point. |
Fatal Errors¶
IMSLS_TOO_MANY_FCN_EVAL |
Maximum number of function evaluations exceeded. |
IMSLS_STOP_USER_FCN |
Request from user supplied function to stop algorithm. User flag = “#”. |