Linear Regression

At various points throughout the course, it will be necessary to build a statistical model that can be used to estimate or forecast a value based on a set of given values. When one has data where a dependent variable is assumed to depend upon a set of independent variables, linear regression is often applied as the first step in data analysis. This is because parameters for the linear model are very tractable and easily interpreted. You will be implementing this model in various ways using Python.

Linear Equations

Linear regression for two variables is based on a linear equation with one independent variable. The equation has the form

y=a+bx

where a and b are constant numbers.

The variable x is the independent variable; y is the dependent variable. Typically, you choose a value to substitute for the independent variable and then solve for the dependent variable.


Example 12.1

The following examples are linear equations.

y=3+2x
y=–0.01+1.2x

Try It 12.1

Is the following an example of a linear equation?
y = –0.125 – 3.5x

The graph of a linear equation of the form y = a + bx is a straight line. Any line that is not vertical can be described by this equation.


Example 12.2

Graph the equation y = –1 + 2x.

Graph of the equation y = -1 + 2x. This is a straight line that crosses the y-axis at -1 and is sloped up and to the right, r

Figure 12.2

Try It 12.2

Is the following an example of a linear equation? Why or why not?

This is a graph of an equation. The x-axis is labeled in intervals of 2 from 0 - 14; the y-axis is labeled in intervals of 2

Figure 12.3


Example 12.3

Aaron's Word Processing Service does word processing. The rate for services is $32 per hour plus a $31.50 one-time charge. The total cost to a customer depends on the number of hours it takes to complete the job.

Find the equation that expresses the total cost in terms of the number of hours required to complete the job.

Solution

Let x = the number of hours it takes to get the job done.
Let y = the total cost to the customer.

The $31.50 is a fixed cost. If it takes x hours to complete the job, then (32)(x) is the cost of the word processing only. The total cost is y = 31.50 + 32x.


Try It 12.3

Emma's Extreme Sports hires hang-gliding instructors and pays them a fee of $50 per class, as well as $20 per student in the class. The total cost Emma pays depends on the number of students in a class. Find the equation that expresses the total cost in terms of the number of students in a class.


Slope and y-interceptof a Linear Equation

For the linear equation y = a + bx, b = \text{slope} and a = \text{y-inttercept}. From algebra, recall that the slope is a number that describes the steepness of a line; the y-intercept is the y-coordinate of the point (0, a), where the line crosses the y-axis.

Please note that in previous courses you learned y=mx+b
was the slope-intercept form of the equation, where m represented the slope and b represented the y-intercept. In this text, the form y=a+bx is used, where a is the y-intercept and b is the slope. The key is remembering the coefficient of x is the slope, and the constant number is the y-intercept.

Three possible graphs of the equation y = a + bx. For the first graph, (a), b > 0 and so the line slopes upward to the right.


Figure 12.4 Three possible graphs of y = a + bx. (a) If b > 0, the line slopes upward to the right. (b) If b = 0, the line is horizontal. (c) If b < 0, the line slopes downward to the right.


Example 12.4

Svetlana tutors to make extra money for college. For each tutoring session, she charges a one-time fee of $25 plus $15 per hour of tutoring. A linear equation that expresses the total amount of money Svetlana earns for each session she tutors is y = 25 + 15x.

What are the independent and dependent variables? What is the y-intercept, and what is the slope? Interpret them using complete sentences.

Solution

The independent variable (x) is the number of hours Svetlana tutors each session. The dependent variable (y) is the amount, in dollars, Svetlana earns for each session.

The y-intercept is 25 (a = 25). At the start of the tutoring session, Svetlana charges a one-time fee of $25 (this is when x = 0). The slope is 15 (b = 15). For each session, Svetlana earns $15 for each hour she tutors.


Try It 12.4

Ethan repairs household appliances such as dishwashers and refrigerators. For each visit, he charges $25 plus $20 per hour of work. A linear equation that expresses the total amount of money Ethan earns per visit is y = 25 + 20x.

What are the independent and dependent variables? What is the y-intercept, and what is the slope? Interpret them using complete sentences.


Source: OpenStax, https://openstax.org/books/statistics/pages/12-introduction
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.