Linear Regression

Site: Saylor Academy
Course: MA121: Introduction to Statistics
Book: Linear Regression
Printed by: Guest user
Date: Wednesday, May 22, 2024, 7:15 AM


Read these sections on linear regression. Linear regression, the simplest form of regression, is used to obtain a linear relationship between two variables.

Linear Regression and Correlation

Learning Outcomes

By the end of this chapter, the student should be able to:

  1. Discuss basic ideas of linear regression and correlation.
  2. Create and interpret a line of best fit.
  3. Calculate and interpret the correlation coefficient.
  4. Calculate and interpret outliers.


Professionals often want to know how two or more numeric variables are related. For example, is there a relationship between the grade on the second math exam a student takes and the grade on the final exam? If there is a relationship, what is it and how strong is the relationship?

In another example, your income may be determined by your education, your profession, your years of experience, and your ability. The amount you pay a repair person for labor is often determined by an initial amount plus an hourly fee. These are all examples in which regression can be used.

The type of data described in the examples is bivariate data - "bi" for two variables. In reality, statisticians use multivariate data, meaning many variables.

In this chapter, you will be studying the simplest form of regression, "linear regression" with one independent variable ( x ). This involves data that fits a line in two dimensions. You will also study correlation which measures how strong the relationship is.

Source: Barbara Illowsky, Ph.D.,Susan Dean,
Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 License.

Linear Equations

Linear regression for two variables is based on a linear equation with one independent variable. It has the form:


where  a and  b are constant numbers.

 x  is the independent variable, and  y is the dependent variable. Typically, you choose a value to substitute for the independent variable and then solve for the dependent variable.

The following examples are linear equations.



The graph of a linear equation of the form  y=a+bx is a straight line. Any line that is not vertical can be described by this equation.

Figure 1. Graph of the equation   y=−1+2x .

Linear equations of this form occur in applications of life sciences, social sciences, psychology, business, economics, physical sciences, mathematics, and other areas.

Aaron's Word Processing Service (AWPS) does word processing. Its rate is $32 per hour plus a $31.50 one-time charge. The total cost to a customer depends on the number of hours it takes to do the word processing job.

Find the equation that expresses the total cost in terms of the number of hours required to finish the word processing job.


Let x = the number of hours it takes to get the job done.

Let y = the total cost to the customer.

The $31.50 is a fixed cost. If it takes x hours to complete the job, then (32)(x) is the cost of the word processing only. The total cost is:


Slope and Y-Intercept of a Linear Equation

For the linear equation  y=a+bx  b = slope and  a

= y-intercept .

From algebra recall that the slope is a number that describes the steepness of a line and the y-intercept is the y coordinate of the point  (0,a) where the line crosses the y-axis.

1a. If  b > 0 , the line slopes upward to the right.

1b. If  b=0 , the line is horizontal.

1c. If  b < 0  , the line slopes downward to the right.

Figure 1. Three possible graphs of  y=a+bx .

Svetlana tutors to make extra money for college. For each tutoring session, she charges a one time fee of $25 plus $15 per hour of tutoring. A linear equation that expresses the total amount of money Svetlana earns for each session she tutors is   y=25+15x

  • What are the independent and dependent variables? What is the y-intercept and what is the slope? Interpret them using complete sentences.


The independent variable (x) is the number of hours Svetlana tutors each session. The dependent variable (y) is the amount, in dollars, Svetlana earns for each session.

The y-intercept is 25 (a = 25). At the start of the tutoring session, Svetlana charges a one-time fee of $25 (this is when x = 0). The slope is 15 (b = 15). For each session, Svetlana earns $15 for each hour she tutors.