Pearson's r

Site: Saylor Academy
Course: MA121: Introduction to Statistics
Book: Pearson's r
Printed by: Guest user
Date: Thursday, April 25, 2024, 10:41 AM

Description

This section introduces Pearson's correlation and explains what the typical values represent. It then elaborates on the properties of r, particularly that it is invariant under linear transformation. Finally, it introduces several formulas we can use to compute Pearson's correlation.

Values of the Pearson Correlation

Learning Objectives

  1. Describe what Pearson's correlation measures
  2. Give the symbols for Pearson's correlation in the sample and in the population
  3. State the possible range for Pearson's correlation
  4. Identify a perfect linear relationship

The Pearson product-moment correlation coefficient is a measure of the strength of the linear relationship between two variables. It is referred to as Pearson's correlation or simply as the correlation coefficient. If the relationship between the variables is not linear, then the correlation coefficient does not adequately represent the strength of the relationship between the variables.

The symbol for Pearson's correlation is "\rho" when it is measured in the population and "r" when it is measured in a sample. Because we will be dealing almost exclusively with samples, we will use r to represent Pearson's correlation unless otherwise noted.

Pearson's r can range from -1 to 1. An r of -1 indicates a perfect negative linear relationship between variables, an r of 0 indicates no linear relationship between variables, and an r of 1 indicates a perfect positive linear relationship between variables. Figure 1 shows a scatter plot for which r=1.

Figure 1. A perfect positive linear relationship, \mathrm{r = 1}.


Figure 2. A perfect negative linear relationship, \mathrm{r = -1}.

 

Figure 3. A scatter plot for which \mathrm{r = 0}. Notice that there is no relationship between \mathrm{X} and \mathrm{Y}.


With real data, you would not expect to get values of \mathrm{r} of exactly \mathrm{-1, \, 0,} or \mathrm{1}. The data for spousal ages shown in Figure 4 and described in the introductory section has an r of 0.97.

Figure 4. Scatter plot of spousal ages, \mathrm{r = 0.97}.

 

Figure 5. Scatter plot of Grip Strength and Arm Strength, \mathrm{r = 0.63}.


The relationship between grip strength and arm strength depicted in Figure 5 (also described in the introductory section) is \mathrm{0.63}.



Source: David M. Lane, https://onlinestatbook.com/2/describing_bivariate_data/pearson.html
Public Domain Mark This work is in the Public Domain.

Questions

Question 1 out of 2.

The scatter plot below represents


  • a positive association
  • a negative association
  • no association


Question 2 out of 2.

The scatter plot below represents


  • a positive association
  • a negative association
  • no association

Answers


  1. As \mathrm{X} increases, \mathrm{Y} tends to increase, so it is a positive association.

  2. As \mathrm{X} increases,  \mathrm{Y} tends to decrease, so it is a negative association.

Properties of Pearson's r

Learning Objectives

1. State the range of values for Pearson's correlation

2. State the values that represent perfect linear relationships

3. State the relationship between the correlation of Y with X and the correlation of X with Y

4. State the effect of linear transformations on Pearson's correlation

A basic property of Pearson's r is that its possible range is from -1 to 1.  A correlation of \mathrm{-1} means a perfect negative linear relationship, a correlation of 0 means no linear relationship, and a correlation of \mathrm{1} means a perfect positive linear relationship.

Pearson's correlation is symmetric in the sense that the correlation of X with Y is the same as the correlation of Y with X. For example, the correlation of Weight with Height is the same as the correlation of Height with Weight.

A critical property of Pearson's r is that it is unaffected by linear transformations. This means that multiplying a variable by a constant and/or adding a constant does not change the correlation of that variable with other variables. For instance, the correlation of Weight and Height does not depend on whether Height is measured in inches, feet, or even miles. Similarly, adding five points to every student's test score would not change the correlation of the test score with other variables such as GPA.

Video

 

 

Questions

Question 1 out of 4.

The correlation between temperature and number of ice cream cones bought is the same whether the temperature is measured in Celsius or Fahrenheit.

  • True
  • False


Question 2 out of 4.

The correlation between two sets of numbers is the same as the correlation between the log of those two sets of numbers.

  • True
  • False


Question 3 out of 4.

Which of the following is not a possible value for Pearson's correlation?

  • -1.5
  • -1
  • 0
  • .99


Question 4 out of 4.

Which is higher, the correlation between height and weight or the correlation between weight and height?

  • weight and height
  • They are about the same.
  • They are exactly the same.
  • height and weight

Answers


  1. It will be the same because that is a linear transformation.

  2. It won't be the same because a log transformation is not a linear transformation.

  3. \mathrm{-1.5}
    Pearson's correlation can be any value between -1 and 1 inclusive.

  4. Correlations are symmetric so they are exactly the same.

Computing Pearson's r

Learning Objectives

  1. Define X and x
  2. State why \sum x y=0 when there is no relationship
  3. Calculate r

There are several formulas that can be used to compute Pearson's correlation. Some formulas make more conceptual sense whereas others are easier to actually compute. We are going to begin with a formula that makes more conceptual sense.

We are going to compute the correlation between the variables X and Y shown in Table 1. We begin by computing the mean for X and subtracting this mean from all values of X. The new variable is called "x". The variable "y" is computed similarly. The variables x and y are said to be deviation scores because each score is a deviation from the mean. Notice that the means of x and y are both 0. Next we create a new column by multiplying x and y.

Before proceeding with the calculations, let's consider why the sum of the x y column reveals the relationship between X and Y. If there were no relationship between X and Y, then positive values of X would be just as likely to be paired with negative values of y as with positive values. This would make negative values of x y as likely as positive values and the sum would be small. On the other hand, consider Table 1 in which high values of X are associated with high values of Y and low values of X are associated with low values of Y. You can see that positive values of x are associated with positive values of y and negative values of x are associated with negative values of y. In all cases, the product of x and y is positive, resulting in a high total for the x y column. Finally, if there were a negative relationship then positive values of x would be associated with negative values of y and negative values of x would be associated with positive values of y. This would lead to negative values for x y.

Table 1. Calculation of r.

  X Y x y xy x2 y2
   1  4 -3 -5 15  9 25
   3  6 -1 -3  3  1  9
   5 10  1  1  1  1  1
   5 12  1  3  3  1  9
   6 13  2  4  8  4 16
Total 20 45  0  0 30 16 60
Mean  4  9  0  0  6    

Pearson's r is designed so that the correlation between height and weight is the same whether height is measured in inches or in feet. To achieve this property, Pearson's correlation is computed by dividing the sum of the x y column \left(\sum x y\right) by the square root of the product of the sum of the x^{2} column \left(\Sigma x^{2}\right) and the sum of the y^{2} column \left(\Sigma y^{2}\right). The resulting formula is:

r=\frac{\sum x y}{\sqrt{\sum x^{2} \sum y^{2}}}

and therefore

\mathrm{r}=\frac{30}{\sqrt{(16)(60)}}=\frac{30}{\sqrt{960}}=\frac{30}{30.984}=0.968

An alternative computational formula that avoids the step of computing deviation scores is:

\mathrm{r}=\frac{\sum_{\mathrm{XY}-} \frac{\sum \mathrm{x} \sum \mathrm{Y}}{\mathrm{N}}}{\sqrt{\left(\sum \mathrm{X}^{2}-\frac{\left(\sum \mathrm{x}\right)^{2}}{N}\right)} \sqrt{\left(\sum \mathrm{Y}^{2}-\frac{\left(\sum \mathrm{Y}\right)^{2}}{N}\right)}}

Video

 

 

Questions

Question 1 out of 4.

What is the correlation between the two variables \mathrm{X} and \mathrm{Y} listed below? (We suggest you use a stat program or Analysis Lab).

_________


 X Y

  8   10

 10    9

 10   11

 11   11

 12    8

 12   10

 15   14

  5      8

 11   11

   9     9

 11   12

 10   13

   7   12

   8     7

   6     9

  15 12

    9 10

  10 11

    9 11

    7    5

    8    7

    8 10

    8    6

    6    9

  10    9

 

Question 2 out of 4.

What deviation score on \mathrm{X} corresponds to the raw score of \mathrm{6}?

_________


  X Y

  2   4

  4   3

  6   5

  

Question 3 out of 4.

What is the sum of \mathrm{xy}?

_________


  X Y

  2   4

  4   3

  6   5

  

Question 4 out of 4.

What is the effect on the correlation of adding \mathrm{12} to every score on one variable?

  • The correlation may go up or down, it depends on the data.
  • The correlation will increase.
  • The correlation will not change.

Answers


  1. Compute the correlation of the two variables. \mathrm{0.4686}

  2. The mean is \mathrm{5}. The deviation score is \mathrm{6-4=2}.

  3. Small letters refer to deviation scores. Multiply the deviation score for each \mathrm{x} value by the corresponding deviation score for each \mathrm{y} value. Then add these values together. \mathrm{(-2)(0) + (0)(-1) + (2)(1) = 2}

  4. The correlation will not change. Since the scores are converted to deviation scores, adding \mathrm{12} will have no effect.