How Regression Is Applied in Contemporary Computing

Extensions

Numerous extensions of linear regression have been developed, which allow some or all of the assumptions underlying the basic model to be relaxed.


Simple and multiple linear regression

Example of simple linear regression, which has one independent variable

Example of simple linear regression, which has one independent variable

The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression (not to be confused with multivariate linear regression).

Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is

Y_{i}=\beta _{0}+\beta _{1}X_{i1}+\beta _{2}X_{i2}+\ldots +\beta _{p}X_{ip}+\epsilon _{i}

for each observation i=1,\ldots ,n.

In the formula above we consider n observations of one dependent variable and p independent variables. Thus, Yi is the ith observation of the dependent variable, Xij is ith observation of the jth independent variable, j = 1, 2, ..., p. The values βj represent parameters to be estimated, and εi is the ith independent identically distributed normal error.

In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:

Y_{ij}=\beta _{0j}+\beta _{1j}X_{i1}+\beta _{2j}X_{i2}+\ldots +\beta _{pj}X_{ip}+\epsilon _{ij}

for all observations indexed as i = 1, ... , n and for all dependent variables indexed as j = 1, ... , m.

Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as general linear regression.


General linear models

The general linear model considers the situation when the response variable is not a scalar (for each observation) but a vector, y_i. Conditional linearity of  E(\mathbf {y} \mid \mathbf {x} _{i})=\mathbf {x} _{i}^{\mathsf {T}}B is still assumed, with a matrix B replacing the vector β of the classical linear regression model. Multivariate analogues of ordinary least squares (OLS) and generalized least squares (GLS) have been developed. "General linear models" are also called "multivariate linear models". These are not the same as multivariable linear models (also called "multiple linear models").


Heteroscedastic models

Various models have been created that allow for heteroscedasticity, i.e. the errors for different response variables may have different variances. For example, weighted least squares is a method for estimating linear regression models when the response variables may have different error variances, possibly with correlated errors. (See also Weighted linear least squares, and Generalized least squares). Heteroscedasticity-consistent standard errors is an improved method for use with uncorrelated but potentially heteroscedastic errors.

Generalized linear models

Generalized linear models (GLMs) are a framework for modeling response variables that are bounded or discrete. This is used, for example:

  • when modeling positive quantities (e.g. prices or populations) that vary over a large scale - which are better described using a skewed distribution such as the log-normal distribution or Poisson distribution (although GLMs are not used for log-normal data, instead the response variable is simply transformed using the logarithm function);
  • when modeling categorical data, such as the choice of a given candidate in an election (which is better described using a Bernoulli distribution/binomial distribution for binary choices, or a categorical distribution/multinomial distribution for multi-way choices), where there are a fixed number of choices that cannot be meaningfully ordered;
  • when modeling ordinal data, e.g. ratings on a scale from 0 to 5, where the different outcomes can be ordered but where the quantity itself may not have any absolute meaning (e.g. a rating of 4 may not be "twice as good" in any objective sense as a rating of 2, but simply indicates that it is better than 2 or 3 but not as good as 5).

Generalized linear models allow for an arbitrary link function, g, that relates the mean of the response variable(s) to the predictors: E(Y)=g^{-1}(XB). The link function is often related to the distribution of the response, and in particular it typically has the effect of transforming between the (-\infty ,\infty ) range of the linear predictor and the range of the response variable.

Some common examples of GLMs are:

  • Poisson regression for count data.
  • Logistic regression and probit regression for binary data.
  • Multinomial logistic regression and multinomial probit regression for categorical data.
  • Ordered logit and ordered probit regression for ordinal data.
Single index models allow some degree of nonlinearity in the relationship between x and y, while preserving the central role of the linear predictor β′x as in the classical linear regression model. Under certain conditions, simply applying OLS to data from a single-index model will consistently estimate β up to a proportionality constant.


Hierarchical linear models

Hierarchical linear models (or multilevel regression) organizes the data into a hierarchy of regressions, for example where A is regressed on B, and B is regressed on C. It is often used where the variables of interest have a natural hierarchical structure such as in educational statistics, where students are nested in classrooms, classrooms are nested in schools, and schools are nested in some administrative grouping, such as a school district. The response variable might be a measure of student achievement such as a test score, and different covariates would be collected at the classroom, school, and school district levels.


Errors-in-variables

Errors-in-variables models (or "measurement error models") extend the traditional linear regression model to allow the predictor variables X to be observed with error. This error causes standard estimators of β to become biased. Generally, the form of bias is an attenuation, meaning that the effects are biased toward zero.


Group effects

In a multiple linear regression model

y=\beta _{0}+\beta _{1}x_{1}+\cdots +\beta _{p}x_{p}+\varepsilon,

parameter \beta _{j} of predictor variable x_{j} represents the individual effect of x_{j}. It has an interpretation as the expected change in the response variable y when x_{j} increases by one unit with other predictor variables held constant. When x_{j} is strongly correlated with other predictor variables, it is improbable that x_{j} can increase by one unit with other variables held constant. In this case, the interpretation of \beta _{j} becomes problematic as it is based on an improbable condition, and the effect of x_{j} cannot be evaluated in isolation.

For a group of predictor variables, say, \{x_{1},x_{2},\dots ,x_{q}\}, a group effect  \xi (\mathbf {w} ) is defined as a linear combination of their parameters

\xi (\mathbf {w} )=w_{1}\beta _{1}+w_{2}\beta _{2}+\dots +w_{q}\beta _{q},

where \mathbf {w} =(w_{1},w_{2},\dots ,w_{q})^{\intercal } is a weight vector satisfying \sum _{j=1}^{q}|w_{j}|=1. Because of the constraint on \({w_{j}}}, \xi (\mathbf {w} ) is also referred to as a normalized group effect. A group effect \xi (\mathbf {w} ) has an interpretation as the expected change in y when variables in the group x_{1},x_{2},\dots ,x_{q} change by the amount w_{1},w_{2},\dots ,w_{q}, respectively, at the same time with variables not in the group held constant. It generalizes the individual effect of a variable to a group of variables in that (i) if q=1, then the group effect reduces to an individual effect, and (ii) if w_{i}=1 and w_{j}=0 for j\neq i, then the group effect also reduces to an individual effect. A group effect xi (\mathbf {w} ) is said to be meaningful if the underlying simultaneous changes of the q variables (w_{1},w_{2},\dots ,w_{q})^{\intercal } is probable.

Group effects provide a means to study the collective impact of strongly correlated predictor variables in linear regression models. Individual effects of such variables are not well-defined as their parameters do not have good interpretations. Furthermore, when the sample size is not large, none of their parameters can be accurately estimated by the least squares regression due to the multicollinearity problem. Nevertheless, there are meaningful group effects that have good interpretations and can be accurately estimated by the least squares regression. A simple way to identify these meaningful group effects is to use an all positive correlations (APC) arrangement of the strongly correlated variables under which pairwise correlations among these variables are all positive, and standardize all p predictor variables in the model so that they all have mean zero and length one. To illustrate this, suppose that \{x_{1},x_{2},\dots ,x_{q}\} is a group of strongly correlated variables in an APC arrangement and that they are not strongly correlated with predictor variables outside the group. Let y' be the centred y and x_{j}' be the standardized x_{j}. Then, the standardized linear regression model is

y'=\beta _{1}'x_{1}'+\cdots +\beta _{p}'x_{p}'+\varepsilon.

Parameters \beta _{j} in the original model, including \beta _{0}, are simple functions of \beta _{j}' in the standardized model. The standardization of variables does not change their correlations, so \{x_{1}',x_{2}',\dots ,x_{q}'\} is a group of strongly correlated variables in an APC arrangement and they are not strongly correlated with other predictor variables in the standardized model. A group effect of \{x_{1}',x_{2}',\dots ,x_{q}'\} is

\xi '(\mathbf {w} )=w_{1}\beta _{1}'+w_{2}\beta _{2}'+\dots +w_{q}\beta _{q}',

and its minimum-variance unbiased linear estimator is

{\hat {\xi }}'(\mathbf {w} )=w_{1}{\hat {\beta }}_{1}'+w_{2}{\hat {\beta }}_{2}'+\dots +w_{q}{\hat {\beta }}_{q}',

where {\hat {\beta }}_{j}' is the least squares estimator of  \beta _{j}'. In particular, the average group effect of the q standardized variables is

\xi _{A}={\frac {1}{q}}(\beta _{1}'+\beta _{2}'+\dots +\beta _{q}'),

which has an interpretation as the expected change in y' when all x_{j}' in the strongly correlated group increase by (1/q)th of a unit at the same time with variables outside the group held constant. With strong positive correlations and in standardized units, variables in the group are approximately equal, so they are likely to increase at the same time and in similar amount. Thus, the average group effect \xi _{A} is a meaningful effect. It can be accurately estimated by its minimum-variance unbiased linear estimator{\hat {\xi }}_{A}={\frac {1}{q}}({\hat {\beta }}_{1}'+{\hat {\beta }}_{2}'+\dots +{\hat {\beta }}_{q}'), even when individually none of the \beta _{j} can be accurately estimated by {\hat {\beta }}_{j}'.

Not all group effects are meaningful or can be accurately estimated. For example, \beta _{1}' is a special group effect with weights w_{1}=1 and w_{j}=0 for j\neq 1, but it cannot be accurately estimated by {\hat {\beta }}'_{1}. It is also not a meaningful effect. In general, for a group of q strongly correlated predictor variables in an APC arrangement in the standardized model, group effects whose weight vectors {w} are at or near the centre of the simplex \sum _{j=1}^{q}w_{j}=1 ( w_{j}\geq 0) are meaningful and can be accurately estimated by their minimum-variance unbiased linear estimators. Effects with weight vectors far away from the centre are not meaningful as such weight vectors represent simultaneous changes of the variables that violate the strong positive correlations of the standardized variables in an APC arrangement. As such, they are not probable. These effects also cannot be accurately estimated.

Applications of the group effects include (1) estimation and inference for meaningful group effects on the response variable, (2) testing for "group significance" of the q variables via testing H_{0}:\xi _{A}=0 versus H_{1}:\xi _{A}\neq 0, and (3) characterizing the region of the predictor variable space over which predictions by the least squares estimated model are accurate.

A group effect of the original variables \{x_{1},x_{2},\dots ,x_{q}\} can be expressed as a constant times a group effect of the standardized variables \{x_{1}',x_{2}',\dots ,x_{q}'\}. The former is meaningful when the latter is. Thus meaningful group effects of the original variables can be found through meaningful group effects of the standardized variables.


Others

In Dempster–Shafer theory, or a linear belief function in particular, a linear regression model may be represented as a partially swept matrix, which can be combined with similar matrices representing observations and other assumed normal distributions and state equations. The combination of swept or unswept matrices provides an alternative method for estimating linear regression models.