Linear Regression and Correlation

Read this chapter to learn how to use graphs, such as scatterplots, to analyze the relationship between two variables. Two variables may be positively or negatively related when different pairs of data show the same pattern. For example, when incomes of individuals rise so does their consumption of goods and services; thus, income and consumption are considered to be positively related. As a person's income rises, the number of bus rides this person takes falls; thus, income and bus riding are negatively related.

Linear Equations

Linear regression for two variables is based on a linear equation with one independent variable. The equation has the form:

y=a+bx

where a and b are constant numbers.

The variable x is the independent variable, and y is the dependent variable. Another way to think about this equation is a statement of cause and effect. The x variable is the cause and the y variable is the hypothesized effect. Typically, you choose a value to substitute for the independent variable and then solve for the dependent variable.


Example 13.1

The following examples are linear equations.

y=3+2x
y=–0.01+1.2x

The graph of a linear equation of the form y= a + bx is a straight line. Any line that is not vertical can be described by this equation.


Example 13.2

Graph the equation y= –1 + 2x.

Figure 13.3
Figure 13.3

Try It 13.2

Is the following an example of a linear equation? Why or why not?

This is a graph of an equation. The x-axis is labeled in intervals of 2 from 0 - 14; the y-axis is labeled in intervals of 2

Figure 13.4

Example 13.3

Aaron's Word Processing Service (AWPS) does word processing. The rate for services is $32 per hour plus a $31.50 one-time charge. The total cost to a customer depends on the number of hours it takes to complete the job.

Problem
Find the equation that expresses the total cost in terms of the number of hours required to complete the job.

Solution 1

Let x= the number of hours it takes to get the job done.
Let y= the total cost to the customer.

The $31.50 is a fixed cost. If it takes x hours to complete the job, then (32)(x) is the cost of the word processing only. The total cost is: y= 31.50 + 32x


Slope and Y-Intercept of a Linear Equation

For the linear equation y= a + bx, b = slope and a = y-intercept. From algebra recall that the slope is a number that describes the steepness of a line, and the y-intercept is the y coordinate of the point (0, a) where the line crosses the y-axis. From calculus the slope is the first derivative of the function. For a linear function the slope is dy / dx = b where we can read the mathematical expression as "the change in y(dy) that results from a change in x(dx) = b * dx".

Figure 13.5

Figure 13.5 Three possible graphs of y= a + bx. (a) If b > 0, the line slopes upward to the right. (b) If b = 0, the line is horizontal. (c) If b < 0, the line slopes downward to the right.


Example 13.4

Svetlana tutors to make extra money for college. For each tutoring session, she charges a one-time fee of $25 plus $15 per hour of tutoring. A linear equation that expresses the total amount of money Svetlana earns for each session she tutors is y= 25 + 15x.

Problem
What are the independent and dependent variables? What is the y-intercept and what is the slope? Interpret them using complete sentences.

Solution 1

The independent variable (x) is the number of hours Svetlana tutors each session. The dependent variable (y) is the amount, in dollars, Svetlana earns for each session.

The y-intercept is 25 (a = 25). At the start of the tutoring session, Svetlana charges a one-time fee of $25 (this is when x= 0). The slope is 15 (b = 15). For each session, Svetlana earns $15 for each hour she tutors.