LinearRegression Class

Fits a multiple linear regression model with or without an intercept.

Inheritance Hierarchy

System.Object
Imsl.Stat.LinearRegression

Namespace: Imsl.Stat
Assembly: ImslCS (in ImslCS.dll) Version: 6.5.2.0

Syntax

C++

Copy

[SerializableAttribute]
public class LinearRegression

<SerializableAttribute>
Public Class LinearRegression

[SerializableAttribute]
public ref class LinearRegression

[<SerializableAttribute>]
type LinearRegression =  class end

The LinearRegression type exposes the following members.

Constructors

	Name	Description
	LinearRegression	Constructs a new linear regression object.

Top

Methods

	Name	Description
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object.)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	GetCaseStatistics(Double[], Double)	Returns the case statistics for an observation.
	GetCaseStatistics(Double[], Double, Double)	Returns the case statistics for an observation and a weight.
	GetCaseStatistics(Double[], Double, Int32)	Returns the case statistics for an observation and future response count for the desired prediction interval.
	GetCaseStatistics(Double[], Double, Double, Int32)	Returns the case statistics for an observation, weight, and future response count for the desired prediction interval.
	GetCoefficients	Returns the regression coefficients.
	GetHashCode	Serves as a hash function for a particular type. (Inherited from Object.)
	GetR	Returns a copy of the R matrix.
	GetRank	Returns the rank of the matrix.
	GetType	Gets the Type of the current instance. (Inherited from Object.)
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object.)
	ToString	Returns a string that represents the current object. (Inherited from Object.)
	Update(Double[,],Double[])	Updates the regression object with a new set of observations.
	Update(Double[], Double)	Updates the regression object with a new observation.
	Update(Double[,],Double[],Double[])	Updates the regression object with a new set of observations and weights.
	Update(Double[], Double, Double)	Updates the regression object with a new observation and weight.

Top

Properties

	Name	Description
	ANOVA	Returns an analysis of variance table and related statistics.
	CoefficientTTests	Returns statistics relating to the regression coefficients.
	HasIntercept	A bool which indicates whether or not an intercept is in this regression model.
	Rank	Returns the rank of the matrix.

Top

Remarks

Fits a multiple linear regression model with or without an intercept. If the constructor argument hasIntercept is true, the multiple linear regression model is

$y_i=\beta_0+\beta_1 x_{i1}+\beta_2 x_{i2}+\,\ldots+\beta _k x_{ik}+\varepsilon _i\,\,\,\,\, i=1,\,2,\,\ldots,\,n$

where the observed values of the

's constitute the responses or values of the dependent variable, the $x_{i1}$ 's, $x_{i2}$ 's, $\ldots,x_{ik}$ 's are the settings of the independent variables, $\beta_0,\beta_1,\ldots,\beta_k$ are the regression coefficients, and the

's are independently distributed normal errors each with mean zero and variance $\sigma^2/w_i$ . If hasIntercept is false, $\beta_0$ is not included in the model.

LinearRegression computes estimates of the regression coefficients by minimizing the sum of squares of the deviations of the observed response from the fitted response

$\hat y_i$

for the observations. This minimum sum of squares (the error sum of squares) is in the ANOVA output and denoted by

${\rm SSE}=\sum\limits_{i=1}^n w_i(y_i-\hat y_i)^2$

In addition, the total sum of squares is output in the ANOVA table. For the case, hasIntercept is true; the total sum of squares is the sum of squares of the deviations of

from its mean

$\bar y$

--the so-called corrected total sum of squares; it is denoted by

${\rm SST}=\sum\limits_{i=1}^n w_i(y_i-\bar y)^2$

For the case hasIntercept is false, the total sum of squares is the sum of squares of

--the so-called uncorrected total sum of squares; it is denoted by

${\rm SST}=\sum\limits_{i=1}^n y_i^2$

See Draper and Smith (1981) for a good general treatment of the multiple linear regression model, its analysis, and many examples.

In order to compute a least-squares solution, LinearRegression performs an orthogonal reduction of the matrix of regressors to upper triangular form. Givens rotations are used to reduce the matrix. This method has the advantage that the loss of accuracy resulting from forming the crossproduct matrix used in the normal equations is avoided, while not requiring the storage of the full matrix of regressors. The method is described by Lawson and Hanson, pages 207-212.

From a general linear model fitted using the 's as the weights, inner class LinearRegression. CaseStatistics can also compute predicted values, confidence intervals, and diagnostics for detecting outliers and cases that greatly influence the fitted regression. Let be a column vector containing elements of the -th row of . Let . The leverage is defined as

(In the case of linear equality restrictions on $\beta$ , the leverage is defined in terms of the reduced model.) Put

with

if the

-th diagonal element of

is positive and 0 otherwise. The leverage is computed as

where

is a solution to

. The estimated variance of

$\hat{y_i}=x_i^T\hat{ \beta}$

is given by

, where

. The computation of the remainder of the case statistics follows easily from their definitions.

Let denote the residual

$y_i-\hat{y_i}$

for the

th case. The estimated variance of

where

is the residual mean square from the fitted regression. The

th standardized residual (also called the internally studentized residual) is by definition

$r_i=e_i\sqrt{\frac{{w_i}}{{ s^2(1-h_i)}}}$

and

follows an approximate standard normal distribution in large samples.

The th jackknife residual or deleted residual involves the difference between and its predicted value based on the data set in which the th case is deleted. This difference equals . The jackknife residual is obtained by standardizing this difference. The residual mean square for the regression in which the th case is deleted is

$s_i^2={\frac{{(n-r)s^2-w_ie_i^2/(1-h_i)}}{{n-r-1}}}$

The jackknife residual is defined to be

$t_i= e_i\sqrt{\frac{{w_i}}{{s_i^2(1-h_i)}}}$

and

follows a

distribution with

degrees of freedom.

Cook's distance for the th case is a measure of how much an individual case affects the estimated regression coefficients. It is given by

$D_i={\frac{{w_i h_i e_i^2}}{{rs^2(1-h_i)^2}}}$

Weisberg (1985) states that if

exceeds the 50-th percentile of the

distribution, it should be considered large. (This value is about 1. This statistic does not have an

distribution.)

DFFITS, like Cook's distance, is also a measure of influence. For the th case, DFFITS is computed by the formula

$DFFITS_i=e_i\sqrt{\frac{{w_i h_i}}{{s_i^2(1- h_i)^2}}}$

Hoaglin and Welsch (1978) suggest that

greater than

$2\sqrt{r/n}$

is large.

Often predicted values and confidence intervals are desired for combinations of settings of the effect variables not used in computing the regression fit. This can be accomplished using a single data matrix by including these settings of the variables as part of the data matrix and by setting the response equal to Double.NaN. LinearRegression will omit the case when performing the fit and a predicted value and confidence interval for the missing response will be computed from the given settings of the effect variables.

Reference

Imsl.Stat Namespace

Other Resources

Example1

Example2