### Unit 6: Correlation and Regression

If two data points move in the same direction, does that mean that one causes the other? How are we to analyze their correlation?

Regression is an analysis of the relationship of one variable to another. A regression might identify, for example, the relationship between car speed and the number of fatal accidents. In this example, speed and number of accidents are the two variables; the number of accidents is said to be the dependent variable, because the number of accidents depends on the speed. Speed is considered the independent variable. While regressions can be calculated manually, a statistically significant data set could take a long time to regress.

Regressions not only allow us to determine whether a relationship exists but also to identify how strong that relationship is. The measure of this relationship is known as the regression coefficient. If the regression coefficient is relatively low, then speed may not be the major factor in fatal accidents. Perhaps the major factor is the time of day, whether it rained or not, or if alcohol was involved. With multiple regression, a number of independent variables can be tested against the dependent variable at the same time. The regression coefficient would determine which variables have the strongest relationship with the dependent variable. In business, you will frequently use regression to predict future events. Though not an exact science, regression can be used to make reliable predictions if enough variables are identified. For example, first responders could use regression outputs to predict the number of fatal accidents in a given shift based on average travel speed, time of day, weather, and any other factors deemed significant. This unit will also stress the importance of determining the factors that most likely contribute to a dependent variable.

Regression is often used in finance. Investors often want to know the relationship between a stock's performance and the overall performance of the market. By regressing the period returns of a stock with the returns of the market, investors can see the regression coefficient. This coefficient is known as a stock's beta and is covered extensively in BUS202: Principles of Finance.

**Completing this unit should take you approximately 5 hours.**

Upon successful completion of this unit, you will be able to:

- identify the dependent and independent variables in the linear regression model;
- calculate the equation of the regression line, and plot it;
- describe the importance of the correlation coefficient and r-squared, and apply these concepts;
- define outlier, identify examples of outliers, and describe what an outlier can do to summaries of data;
- estimate a regression line and identify the effect of the independent variable on the dependent variable; and
- draw a scatter plot, explain how to use a spreadsheet to draw a scatter plot, find the equation of the least-squared line, and draw the line.

- identify the dependent and independent variables in the linear regression model;

### 6.1 Working with More Than One Variable

Read this chapter to learn how to use graphs, such as scatterplots, to analyze the relationship between two variables. Two variables may be positively or negatively related when different pairs of data show the same pattern. For example, when incomes of individuals rise so does their consumption of goods and services; thus, income and consumption are considered to be positively related. As a person's income rises, the number of bus rides this person takes falls; thus, income and bus riding are negatively related.

Watch the first lecture from 0:50:00 to the end. In it, Professor Stark differentiates between univariate and multivariate data. It also covers different data types and how to plot and interpret the correlation between data variables, and works through some examples. Then, watch the second lecture until 0:38:00, in which he works through some additional examples.

### 6.2: Correlation and Association

### 6.3: Regression

Watch this lecture, which discusses how to interpret and understand a linear regression and how regression equations enable you to make predictions.

### 6.4: Spreadsheet Activity for Unit 6

Read this chapter, which discusses linear regressions and best fit lines.

For this activity, you will review how a spreadsheet can be used to plot data, determine the slope and intercept of regression line, and draw the regression line. The instructions for creating the scatter graph and regression line are in Section 4.25. However, for this activity, we are solving for the problem presented in Section 4.4. The supporting spreadsheet files (links above to both Excel and Open Office versions) include a tab titled "Starter File", which contains everything you need to get started on the activity. Once you have worked through the activity, you can click on the "Solution File" tab to see how your finished spreadsheet should look.

### Unit 6 Problem Set and Assessment

Solve these problems, then check your answers against the given solutions.

- Receive a grade
Take this assessment to see how well you understood this unit.

- This assessment
**does not count towards your grade**. It is just for practice! - You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
- You can take this assessment as many times as you want, whenever you want.

- This assessment