MA121 Study Guide


Unit 6: Linear Regression

6a. Discuss and apply basic ideas of linear regression and correlation

  • What is the correlation coefficient and what does it tell us?
  • How is the correlation related to the slope of a regression line? Do they tell us roughly the same thing?

The correlation coefficient is a measure of the linear relationship between two variables x & y. It is a number between −1 and 1, inclusive.

  • 1 means there is a perfect positive correlation. The scatter plot slopes upward in a straight line.
  • −1 means perfect negative correlation. The scatter plot slopes downward in a straight line.
  • 0 means there is no correlation, as if every x value produces a completely random value for y.

In this way, correlation is related to the regression line slope in that they both have the same sign. However if the points are in a straight line sloping upward, it will have a correlation of 1 regardless of the line's slope. Remember, the slope of a line can be any real number, where the correlation is capped between −1 and +1.

Review this material in:


6b. Identify the assumptions that inferential statistics in regression are based on

  • Why do we call the regression line the "least squares" regression line?
  • What conditions must be true of a sample of points to make the correlation or regression line statistically significant?

We calculate statistical significance in much the same way as we determine the mean or proportion (from a sample) statistically significant, as we reviewed in Units 4 and 5.

To find confidence intervals, we use the concepts and formulas the chapter Statistical Inferences about the Slope refers to. We conduct hypothesis testing for the slope in the same way as for any other statistic.

Remember that it is always best to find and interpret the correlation coefficient first. While correlation does not necessarily prove a causative relationship between the two variables, if the correlation is very low, it is unlikely that the regression line will be of any use.

As long as the line is non-vertical, you WILL always get a solution for the least-squares regression line. Think of the phrase, "garbage in, garbage out". If the slope is not significant, then the regression is useless. You can also review the resource Testing the Significance of the Correlation Coefficient to learn how to do a hypothesis test for a correlation coefficient. There is a hypothesis test for just about everything in statistics!

The general method behind the regression line formula is that we want to find the line as follows: Draw a vertical line between every point on the scatter plot and the regression line. Then make that one side of a square. The line that gives the lowest total area (i.e. the lowest sum of squares) will be considered the best fit. Thus "least squares" regression line. 

Review this material in:


6c. Compute the standard error of a slope

  • What does the standard error of a slope tell you?
  • How is the standard error computed?

The standard error for a slope tells you basically the same thing that any other standard error tells you. Remember back to Unit 3 where we first defined standard error which is the standard deviation of the sampling distribution. So the standard error for the mean is the standard deviation of a set of sample means. It is a sign of how reliable those samples are by how much samples vary. A low standard error will produce a narrower confidence interval and make it more likely to reject an incorrect null hypothesis here.

Carry this logic forward to the interpretation of a slope. The standard error may not tell you much by itself (its computation is more complex than for means and proportions) but it is a component of statistical inference involving the slope of a regression line. The formula for the standard error is complex, but you can find it here: Regression Slope Test.

Review this material in Statistical Inferences about the Slope.


6d. Test a slope for significance

  • How would we test a slope for significance? How does this relate to hypothesis testing?

As stated above, hypothesis testing works for the slope or correlation of a regression line in the same general way that it works for the mean and proportion: You have a null hypothesis of no significance (r = 0), and an alternative that is almost always two-tailed (r ≠ 0). You can use the formulas in the resources below to find the T-statistic and then use the same methods (as we used in Units 4 and 5) to find the p-value: the combined area of the right and tails formed by the positive and negative of that T-statistic.

Review this material in Statistical Inferences about the Slope.


6e. Construct a confidence interval on a slope

  • What should the confidence interval for a slope look like if the slope is significant? Is this similar to the significance test?

Remember the confidence interval gives the range of values that most likely contain the true parameter. In the case of the slope, we want a confidence interval that does NOT include 0. Because if the confidence interval is, say [−0.8, 2.1] then the slope could be positive or negative, which would cause us to conclude that the slope we found is not significant.

Review this material in Statistical Inferences about the Slope.


6f. Calculate and interpret the coefficient of determination and the correlation coefficient

  • What is the coefficient of determination and how is it calculated?
  • How is the correlation coefficient calculated and how is it related to the coefficient of determination?

The coefficient of determination, simply put, is the square of the correlation coefficient. It can be calculated that way. There is also a formula in your text to calculate this value, in case the coefficient is not already calculated. What the coefficient tells us in effect, is the proportion of the variable y that is explained by the variable(s) x. So if a correlation is 0.8 then the coefficient of determination is 0.64, telling us that roughly 64% of the dependent variable (y) is explained by the independent variable (x).

Review this material in:


Unit 6 Vocabulary

  • Coefficient of determination
  • Correlation coefficient
  • Least-squares regression
  • Scatter plot
  • Slope
  • Standard error
  • Y-intercept