Click or drag to resize
CategoricalGenLinModel Class
Analyzes categorical data using logistic, probit, Poisson, and other linear models.
Inheritance Hierarchy
SystemObject
  Imsl.StatCategoricalGenLinModel

Namespace: Imsl.Stat
Assembly: ImslCS (in ImslCS.dll) Version: 6.5.2.0
Syntax
[SerializableAttribute]
public class CategoricalGenLinModel

The CategoricalGenLinModel type exposes the following members.

Constructors
  NameDescription
Public methodCategoricalGenLinModel
Constructs a new CategoricalGenLinModel.
Top
Methods
  NameDescription
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Protected methodFinalize
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetHashCode
Serves as a hash function for a particular type.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodSetEffects
Initializes an index vector to contain the column numbers in x associated with each effect.
Public methodSetInitialEstimates
Sets the initial parameter estimates option.
Public methodSolve
Returns the parameter estimates and associated statistics for a CategoricalGenLinModel object.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
Properties
  NameDescription
Public propertyCaseAnalysis
The case analysis.
Public propertyCensorColumn
The column number in x which contains the interval type for each observation.
Public propertyClassificationVariableColumn
An index vector to contain the column numbers in x that are classification variables.
Public propertyClassificationVariableCounts
The number of values taken by each classification variable.
Public propertyClassificationVariableValues
The distinct values of the classification variables in ascending order.
Public propertyConvergenceTolerance
The convergence criterion.
Public propertyCovarianceMatrix
The estimated asymptotic covariance matrix of the coefficients.
Public propertyDesignVariableMeans
The means of the design variables.
Public propertyExtendedLikelihoodObservations
A vector indicating which observations are included in the extended likelihood.
Public propertyFixedParameterColumn
The column number in x that contains a fixed parameter for each observation that is added to the linear response prior to computing the model parameter.
Public propertyFrequencyColumn
The column number in x that contains the frequency of response for each observation.
Public propertyHessian
The Hessian computed at the initial parameter estimates.
Public propertyInfiniteEstimateMethod
Specifies the method used for handling infinite estimates.
Public propertyLastParameterUpdates
The last parameter updates (excluding step halvings).
Public propertyLowerEndpointColumn
The column number in x that contains the lower endpoint of the observation interval for full interval and right interval observations.
Public propertyMaxIterations
The maximum number of iterations allowed.
Public propertyModelIntercept
The intercept option.
Public propertyNRowsMissing
The number of rows of data in x that contain missing values in one or more specific columns of x.
Public propertyObservationMax
The maximum number of observations that can be handled in the linear programming.
Public propertyOptimizedCriterion
The optimized criterion.
Public propertyOptionalDistributionParameterColumn
The column number in x that contains an optional distribution parameter for each observation.
Public propertyParameters
Parameter estimates and associated statistics.
Public propertyProduct
The inverse of the Hessian times the gradient vector computed at the input parameter estimates.
Public propertyTolerance
The tolerance used in determining linear dependence.
Public propertyUpperBound
Defines the upper bound on the sum of the number of distinct values taken on by each classification variable.
Public propertyUpperEndpointColumn
The column number in x that contains the upper endpoint of the observation interval for full interval and left interval observations.
Top
Remarks

Reweighted least squares is used to compute (extended) maximum likelihood estimates in some generalized linear models involving categorized data. One of several models, including probit, logistic, Poisson, logarithmic, and negative binomial models, may be fit for input point or interval observations. (In the usual case, only point observations are observed.)

Let

{\gamma}_i=w_i+x_i^T\beta=w_i+\eta_i
be the linear response where x_i is a design column vector obtained from a row of x,\beta is the column vector of coefficients to be estimated, and w_i is a fixed parameter that may be input in x. When some of the {\gamma}_i are infinite at the supremum of the likelihood, then extended maximum likelihood estimates are computed. Extended maximum likelihood is computed as the finite (but nonunique) estimates \hat{\beta} that optimize the likelihood containing only the observations with finite {\hat{
            \gamma}}_i. These estimates, when combined with the set of indices of the observations such that {\hat{\gamma}}_i
            is infinite at the supremum of the likelihood, are called extended maximum estimates. When none of the optimal {\hat{
            \gamma}}_i are infinite, extended maximum likelihood estimates are identical to maximum likelihood estimates. Extended maximum likelihood estimation is discussed in more detail by Clarkson and Jennrich (1991). In CategoricalGenLinModel, observations with potentially infinite
{\hat{\eta}}_i = x_i^T\hat{\beta}
are detected and removed from the likelihood if InfiniteEstimateMethod = 0. See below.

The models available in CategoricalGenLinModel are:

Model NameParameterizationResponse PDF
Model0 (Poisson)\lambda=N
            \times{e^{w+\eta}}f(y)=\lambda^{y}e^{
            -\lambda}/y!
Model1 (Negative Binomial)
            \theta=\frac{e^{w+\eta}}{1+e^{w+\eta}}
            f(y)=\left(\begin{array}{rr}S+y-1\\y-1\end{array}\right)\theta^S(1-
            \theta)^y
Model2 (Logarithmic)\theta=
            \frac{e^{w+\eta}}{1+e^{w+\eta}}f(y)=(1
            -\theta)^y/(y\ln\theta)
Model3 (Logistic)\theta=\frac
            {e^{w +\eta}}{1+e^{w+\eta}}
            f(y)=\left(\begin{array}{rr}N\\y\end{array}\right)\theta^y(1-\theta
            )^{N-y}
Model4 (Probit)\theta=\Phi(w+
            \eta)f(y)=\left(\begin{array}{rr}N\\y
            \end{array}\right)\theta^y(1-\theta)^{N-y}
Model5 (Log-log)\theta=1-e^{-
            e^{w+\eta}}f(y)=\left(\begin{array}
            {rr}N\\y\end{array}\right)\theta^y(1-\theta)^{N-y}

Here \Phi denotes the cumulative normal distribution, N and S are known parameters specified for each observation via column OptionalDistributionParameterColumn of x, and w is an optional fixed parameter specified for each observation via column FixedParameterColumn of x. (By default N is taken to be 1 for model = 0, 3, 4 and 5 and S is taken to be 1 for model = 1. By default w is taken to be 0.) Since the log-log model (model = 5) probabilities are not symmetric with respect to 0.5, quantitatively, as well as qualitatively, different models result when the definitions of "success" and "failure" are interchanged in this distribution. In this model and all other models involving 
            \theta, \theta is taken to be the probability of a "success."

Note that each row vector in the data matrix can represent a single observation; or, through the use of column FrequencyColumn of the matrix x, each vector can represent several observations.

Computational Details

For interval observations, the probability of the observation is computed by summing the probability distribution function over the range of values in the observation interval. For right-interval observations, \Pr(Y\ge{y}) is computed as a sum based upon the equality \Pr(Y\ge{y})=1-\Pr(Y\lt{y}). Derivatives are similarly computed. CategoricalGenLinModel allows three types of interval observations. In full interval observations, both the lower and the upper endpoints of the interval must be specified. For right-interval observations, only the lower endpoint need be given while for left-interval observations, only the upper endpoint is given.

The computations proceed as follows:

  1. The input parameters are checked for consistency and validity.
  2. Estimates of the means of the "independent" or design variables are computed. The frequency of the observation in all but the binomial distribution model is taken from column FrequencyColumn of the data matrix x. In binomial distribution models, the frequency is taken as the product of n = x[i,OptionalDistributionParameterColumn] and x[i,FrequencyColumn]. In all cases these values default to 1. Means are computed as
    \bar{x}=\frac{\Sigma_if_ix_i}{\Sigma_if_i}
  3. If init = 0, initial estimates of the coefficients are obtained (based upon the observation intervals) as multiple regression estimates relating transformed observation probabilities to the observation design vector. For example, in the binomial distribution models, \theta for point observations may be estimated as
    \hat{\theta}=x[i,LowerEndpointColumn]/x[i,
            OptionalDistributionParameterColumn]
    and, when model = 3, the linear relationship is given by
    \left(\ln(\hat{\theta}/(1-\hat{\theta}))
            \approx x\beta\right)
    while if model = 4,
    \left(\Phi^{-1}(\hat{\theta})=x\beta\right)

    For bounded interval observations, the midpoint of the interval is used for x[i,LowerEndpointColumn]. Right-interval observations are not used in obtaining initial estimates when the distribution has unbounded support (since the midpoint of the interval is not defined). When computing initial estimates, standard modifications are made to prevent illegal operations such as division by zero.

    Regression estimates are obtained at this point, as well as later, by use of linear regression.

  4. Newton-Raphson iteration for the maximum likelihood estimates is implemented via iteratively reweighted least squares. Let
    \Psi(x^T_i\beta)
    denote the log of the probability of the i-th observation for coefficients \beta. In the least-squares model, the weight of the i-th observation is taken as the absolute value of the second derivative of
    \Psi(x^T_i\beta)
    with respect to
    \gamma_i=x^T_i\beta
    (times the frequency of the observation), and the dependent variable is taken as the first derivative \Psi with respect to \gamma_i, divided by the square root of the weight times the frequency. The Newton step is given by
    \Delta\beta=\left(\sum_{i}|\Psi^{''}(\gamma_i)
            |x_ix_i^T \right)^{-1} \sum_{i}\Psi^{'}(\gamma_i)x_i
    where all derivatives are evaluated at the current estimate of \gamma, and \beta_{n+1}=\beta_n-\Delta
            \beta. This step is computed as the estimated regression coefficients in the least-squares model. Step halving is used when necessary to ensure a decrease in the criterion.
  5. Convergence is assumed when the maximum relative change in any coefficient update from one iteration to the next is less than ConvergenceTolerance or when the relative change in the log-likelihood from one iteration to the next is less than ConvergenceTolerance/100. Convergence is also assumed after MaxIterations or when step halving leads to a step size of less than .0001 with no increase in the log-likelihood.
  6. For interval observations, the contribution to the log-likelihood is the log of the sum of the probabilities of each possible outcome in the interval. Because the distributions are discrete, the sum may involve many terms. The user should be aware that data with wide intervals can lead to expensive (in terms of computer time) computations.
  7. If InfiniteEstimateMethod is set to 0, then the methods of Clarkson and Jennrich (1991) are used to check for the existence of infinite estimates in

    \eta_i=x_i^T\beta
    As an example of a situation in which infinite estimates can occur, suppose that observation j is right censored with 
            t_j\gt{15} in a logistic model. If design matrix x is such that x_{jm}=1 and 
            x_{im}=0 for all i\neq{j}, then the optimal estimate of \beta_m occurs at
    \hat{\beta_m}=\infty
    leading to an infinite estimate of both \beta_m and \eta_j. In CategoricalGenLinModel, such estimates may be "computed." In all models fit by CategoricalGenLinModel, infinite estimates can only occur when the optimal estimated probability associated with the left- or right-censored observation is 1. If InfiniteEstimateMethod is set to 0, left- or right- censored observations that have estimated probability greater than 0.995 at some point during the iterations are excluded from the log-likelihood, and the iterations proceed with a log-likelihood based upon the remaining observations. This allows convergence of the algorithm when the maximum relative change in the estimated coefficients is small and also allows for the determination of observations with infinite
    \eta_i=x_i^T\beta
    At convergence, linear programming is used to ensure that the eliminated observations have infinite \eta_i. If some (or all) of the removed observations should not have been removed (because their estimated \eta_{i's} must be finite), then the iterations are restarted with a log-likelihood based upon the finite \eta_i observations. See Clarkson and Jennrich (1991) for more details.

    When InfiniteEstimateMethod is set to 1, no observations are eliminated during the iterations. In this case, when infinite estimates occur, some (or all) of the coefficient estimates \hat{\beta}
            will become large, and it is likely that the Hessian will become (numerically) singular prior to convergence.

    When infinite estimates for the \hat{\eta_i} are detected, linear regression (see Chapter 2, Regression;) is used at the convergence of the algorithm to obtain unique estimates \hat{
            \beta}. This is accomplished by regressing the optimal \hat{\eta_i} or the observations with finite \eta against x\beta, yielding a unique \hat{\beta} (by setting coefficients \hat{\beta} that are linearly related to previous coefficients in the model to zero). All of the final statistics relating to \hat{\beta} are based upon these estimates.

  8. Residuals are computed according to methods discussed by Pregibon (1981). Let \ell_i(\gamma_i) denote the log-likelihood of the i-th observation evaluated at \gamma_i. Then, the standardized residual is computed as

    r_i=\frac{\ell_i^{'}(\hat{\gamma_i})}{
            \sqrt{\ell_i^{''}(\hat{\gamma_i})}}
    where \hat{\gamma_i} is the value of 
            \gamma_i when evaluated at the optimal \hat{\beta}
            and the derivatives here (and only here) are with respect to \gamma rather than with respect to \beta
            . The denominator of this expression is used as the "standard error of the residual" while the numerator is the "raw" residual.

    Following Cook and Weisberg (1982), we take the influence of the i-th observation to be

    \ell_i^{'}(\hat{\gamma_i})^T\ell^{''}(\hat
            {\gamma})^{-1}\ell^{'}(\hat{\gamma_i})

    This quantity is a one-step approximation to the change in the estimates when the i-th observation is deleted. Here, the partial derivatives are with respect to \beta.

Programming Notes

  1. Classification variables are specified via ClassificationVariableColumn. Indicator or dummy variables are created for the classification variables.
  2. To enhance precision "centering" of covariates is performed if ModelIntercept is set to 1 and (number of observations) - (number of rows in x missing one or more values) > 1. In doing so, the sample means of the design variables are subtracted from each observation prior to its inclusion in the model. On convergence the intercept, its variance and its covariance with the remaining estimates are transformed to the uncentered estimate values.
  3. Two methods for specifying a binomial distribution model are possible. In the first method, x[i,FrequencyColumn] contains the frequency of the observation while x[i,LowerEndpointColumn] is 0 or 1 depending upon whether the observation is a success or failure. In this case, N = x[i,OptionalDistributionParameterColumn] is always 1. The model is treated as repeated Bernoulli trials, and interval observations are not possible.

A second method for specifying binomial models is to use x[i,LowerEndpointColumn] to represent the number of successes in the x[i,OptionalDistributionParameterColumn] trials. In this case, x[i,FrequencyColumn] will usually be 1, but it may be greater than 1, in which case interval observations are possible.

Note that the Solve method must be called before using any property as a right operand, otherwise the value is null.

See Also