## The Linear Correlation Coefficient

Read this discussion on linear correlation. You will learn what the linear correlation coefficient is, how to compute it, and what it tells us about the relationship between two variables x and y.

### The Linear Correlation Coefficient

#### Learning Objective

1. To learn what the linear correlation coefficient is, how to compute it, and what it tells us about the relationship between two variables $x$ and $y$.

Figure 10.3 "Linear Relationships of Varying Strengths" illustrates linear relationships between two variables $x$ and $y$ of varying strengths. It is visually apparent that in the situation in panel (a), $x$ could serve as a useful predictor of $y$, it would be less useful in the situation illustrated in panel (b), and in the situation of panel (c) the linear relationship is so weak as to be practically nonexistent. The linear correlation coefficient is a number computed directly from the data that measures the strength of the linear relationship between the two variables $x$ and $y$.

Figure 10.3 Linear Relationships of Varying Strengths

#### Definition

The linear correlation coefficient for a collection of $n$ pairs $(x,y)$ of numbers in a sample is the number $r$ given by the formula

$r = \dfrac{SS_{xy}}{\sqrt{SS_{xx} \cdot SS_{yy}}}$

where

$SS_{xx}=Σx^2−\dfrac{1}{n}(Σx)^2$, $SS_{xy}=Σxy−\dfrac{1}{n}(Σx)(Σy)$, $SS_{yy}=Σy^2−\dfrac{1}{n}(Σy)^2$

1. The linear correlation coefficient has the following properties, illustrated in Figure 10.4 "Linear Correlation Coefficient ": The value of $r$ lies between −1 and 1, inclusive.

2. The sign of $r$ indicates the direction of the linear relationship between $x$ and $y$:

1. If $r < 0$ then $y$ tends to decrease as $x$ is increased.
2. If $r > 0$ then $y$ tends to increase as $x$ is increased.

3. The size of $|r|$ indicates the strength of the linear relationship between $x$ and $y$:
1. If $|r|$ is near 1 (that is, if $r$ is near either 1 or −1) then the linear relationship between $x$ and $y$ is strong.
2. If $|r|$ is near 0 (that is, if $r$ is near 0 and of either sign) then the linear relationship between $x$ and $y$ is weak.

Figure 10.4 Linear Correlation Coefficient R

Pay particular attention to panel (f) in Figure 10.4 "Linear Correlation Coefficient ". It shows a perfectly deterministic relationship between $x$ and $y$, but $r=0$ because the relationship is not linear. (In this particular case the points lie on the top half of a circle)

#### Example 1

Compute the linear correlation coefficient for the height and weight pairs plotted in Figure 10.2 "Plot of Height and Weight Pairs".

##### Solution:

Even for small data sets like this one computations are too long to do completely by hand. In actual practice the data are entered into a calculator or computer and a statistics program is used. In order to clarify the meaning of the formulas we will display the data and related quantities in tabular form. For each $(x,y)$ pair we compute three numbers: $x^2$, $xy$, and $y^2$, as shown in the table provided. In the last line of the table we have the sum of the numbers in each column. Using them we compute:

$x$ $y$ $x^2$ $xy$ $y^2$

68 151 4624 10268 22801

69 146 4761 10074 21316

70 157 4900 10990 24649

70 164 4900 11480 26896

71 171 5041 12141 29241

72 160 5184 11520 25600

72 163 5184 11736 26569

72 180 5184 12960 32400

73 170 5329 12410 28900

73 175 5329 12775 30625

74 178 5476 13172 31684

75 188 5625 14100 35344
$Σ$ 859 2003 61537 143626 336025

$S S_{x x}=\Sigma x^{2}-\frac{1}{n}(\Sigma x)^{2}=61537-\frac{1}{12}(859)^{2}=46.91 \overline{6}$

$S S_{x y}=\Sigma x y -\frac{1}{n}(\Sigma x)(\Sigma y)=143626-\frac{1}{12}(859)(2003)=244.58 \overline{3}$

$S S_{y y}=\Sigma y^{2}-\frac{1}{n}(\Sigma x)^{2}=336025-\frac{1}{12}(2003)^{2}=1690.91 \overline{6}$

so that

$r=\frac{S S_{v}}{\sqrt{S S_{w} S S_{y}}}=\frac{244.58 \overline{3}}{\sqrt{(46.91 \overline{6})(1690.91 \overline{6})}}=0.868$
The number $r=0.868$ quantifies what is visually apparent from Figure 10.2 "Plot of Height and Weight Pairs": weights tends to increase linearly with height ($r$ is positive) and although the relationship is not perfect, it is reasonably strong ($r$ is near 1).