### Unit 6: Linear Regression

In this unit, we will discuss situations in which the mean of a population, treated as a variable, depends on the value of another variable. One of the main reasons why we conduct such analyses is to understand how two variables are related to each other. The most common type of relationship is a linear relationship. For example, you may want to know what happens to one variable when you increase or decrease the other variable. You want to answer questions such as, "Does one variable increase as the other increases, or does the variable decrease?” For example, you may want to determine how the mean reaction time of rats depends on the amount of drug in bloodstream.

In this unit, you will also learn to measure the degree of a relationship between two or more variables. Both correlation and regression are measures for comparing variables. Correlation quantifies the strength of a relationship between two variables and is a measure of existing data. On the other hand, regression is the study of the strength of a linear relationship between an independent and dependent variable and can be used to predict the value of the dependent variable when the value of the independent variable is known.

**Completing this unit should take you approximately 4 hours.**

Upon successful completion of this unit, you will be able to:

- discuss and apply basic ideas of linear regression and correlation;
- identify the assumptions that inferential statistics in regression are based on;
- compute the standard error of a slope;
- test a slope for significance;
- construct a confidence interval on a slope; and
- calculate and interpret the coefficient of determination and the correlation coefficient.

### 6.1: The Regression Model

### 6.1.1: Scatter Plot of Two Variables and Regression Line

This section defines simple linear regression, uses scatter plots to reveal linear patterns, and talks about prediction error. It also discusses how to compute regression line by minimizing squared errors.

Read these sections on linear regression. Linear regression, the simplest form of regression, is used to obtain a linear relationship between two variables.

### 6.1.2: Correlation Coefficient

- Read these sections on correlation. You will learn the interpretation and calculation of the correlation coefficient, how to test its significance, and the relation between correlation and causation.
Read this discussion on linear correlation. You will learn what the linear correlation coefficient is, how to compute it, and what it tells us about the relationship between two variables x and y.

### 6.1.3: Sums of Squares

This section discusses the sums of squares, including partitioning sum of squares into sums of squares predicted and sum of squares error.

Watch these videos, which discuss the regression line.

### 6.2: Fitting the Model

### 6.2.1: Standard Errors of the Least Squares Estimates

This section discusses how to compute the standard error of the estimate based on errors of prediction as well as how to compute the standard error of the estimate based on a sample.

### 6.2.2: Statistical Inference for the Slope and Correlation

This section starts with assumptions on the errors that are necessary for statistical inference. Then, it gives an example of a significance test for the slope. Finally, it talks about constructing confidence intervals for the slope and closes with a significance test for the correlation.

This section further details two types of inferences on the slope parameter, considering both confidence intervals and hypothesis testing.

### 6.2.3: Influential Observations

This section discusses the notion of influence and describes what makes a point influential. It introduces the concepts of leverage and distance, which are useful to detect influential observations.

This section explains linear regression, from presenting the data to using scatter plots to identify the linear pattern. It then fits a linear model using least squares estimation and addresses statistical inferences on correlation coefficient and slope parameter.

### 6.3: ANOVA

This optional subunit will teach you about "Analysis of Variance" (abbreviated ANOVA), which is used for hypothesis tests involving more than two averages. ANOVA is about examining the amount of variability in the y variable and trying to see where that variability is coming from. You will study the simplest form of ANOVA, called single factor or one-way ANOVA. Finally, you will briefly study the F distribution, used for ANOVA, and the test of two variances.

Watch these videos, which discuss each of the steps in ANOVA. While these videos are optional, studying ANOVA may help you if you are interested in taking the credit-aligned exam that is linked with this course.

Read this chapter and complete the questions at the end of each section. While these sections are optional, studying ANOVA may help you if you are interested in taking the Saylor Direct Credit exam for this course.

### Unit 6 Assessment

- Receive a grade
Take this assessment to see how well you understood this unit.

- This assessment
**does not count towards your grade**. It is just for practice! - You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
- You can take this assessment as many times as you want, whenever you want.

- This assessment