Standard Error of the Estimate
Standard Error of the Estimate
Learning Objectives
- Make judgments about the size of the standard error of the estimate from a scatter plot
- Compute the standard error of the estimate based on errors of prediction
- Compute the standard error using Pearson's correlation
- Estimate the standard error of the estimate based on a sample
Figure 1 shows two regression examples. You can see that in Graph A, the points are closer to the line than they are in Graph B. Therefore, the predictions in Graph A are more accurate than in Graph B.

Figure 1. Regressions differing in accuracy of prediction.
The standard error of the estimate is a measure of the accuracy of predictions. Recall that the regression line is the line that minimizes the sum of squared deviations of prediction (also called the sum of squares error). The standard error of the estimate is closely related to this quantity and is defined below:
where σest is the standard error of the estimate,
Note the similarity of the formula for σest to the formula for
Assume the data in Table 1 are the data from a population of five
Table 1. Example data.
|
|
|
|
|
|
---|---|---|---|---|---|
1.00 | 1.00 | 1.210 | -0.210 | 0.044 | |
2.00 | 2.00 | 1.635 | 0.365 | 0.133 | |
3.00 | 1.30 | 2.060 | -0.760 | 0.578 | |
4.00 | 3.75 | 2.485 | 1.265 | 1.600 | |
5.00 | 2.25 | 2.910 | -0.660 | 0.436 | |
Sum | 15.00 | 10.30 | 10.30 | 0.000 | 2.791 |
The last column shows that the sum of the squared errors of prediction is 2.791. Therefore, the standard error of the estimate is
There is a version of the formula for the standard error in terms of Pearson's correlation:
which is the same value computed previously.
Similar formulas are used when the standard error of the estimate is computed from a sample rather than a population. The only difference is that the denominator is
rather than
. The reason
is used rather than
is that two parameters (the slope and the intercept) were estimated in order to estimate the sum of squares. Formulas for a sample comparable to the ones for a population are shown below.
R code
x=c(1,2,3,4,5)
y= c(1,2,1.3,3.75,2.25)
summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5
-0.210 0.365 -0.760 1.265 -0.660
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.785 1.012 0.776 0.494
x 0.425 0.305 1.393 0.258
Residual standard error: 0.9645 on 3 degrees of freedom
Multiple R-squared: 0.3929, Adjusted R-squared: 0.1906
F-statistic: 1.942 on 1 and 3 DF, p-value: 0.2578
Source: David M. Lane , https://onlinestatbook.com/2/regression/accuracy.html This work is in the Public Domain.