Measures of Central Location

This section elaborates on mean, median, and mode at the population level and sample level. This section also contains many interesting examples of range, variance, and standard deviation. Complete the exercises and check your answers.

Measures of Variability

The Variance and the Standard Deviation

The other two measures of variability that we will consider are more elaborate and also depend on whether the data set is just a sample drawn from a much larger population or is the whole population itself (that is, a census).


Definition

The sample variance of a set of n sample data is the number s^{2} defined by the formula

s^{2}=\frac{\Sigma(x-\bar{x})^{2}}{n-1}

which by algebra is equivalent to the formula

s^{2}=\frac{\sum x^{2}-\frac{1}{n}(\Sigma x)^{2}}{n-1}

The sample standard deviation of a set of n sample data is the square root of the sample variance, hence is the number s given by the formulas

s=\sqrt{\frac{\sum(x-\bar{x})^{2}}{n-1}}=\sqrt{\frac{\sum x^{2}-\frac{1}{n}(\Sigma x)^{2}}{n-1}}

Although the first formula in each case looks less complicated than the second, the latter is easier to use in hand computations, and is called a shortcut formula.


EXAMPLE 11

Find the sample variance and the sample standard deviation of Data Set II in Table 2.1 "Two Data Sets".


Solution:

To use the defining formula (the first formula) in the definition we first compute for each observation x its deviation x-\bar{x} from the sample mean. Since the mean of the data is \bar{x}=40, we obtain the ten numbers displayed in the second line of the supplied table.

 \begin{array}{c|cccccccccc} x & 46 & 37 & 40 & 33 & 42 & 36 & 40 & 47 & 34 & 45 \\ \hline x-\bar{x} & 6 & -3 & 0 & -7 & 2 & -4 & 0 & 7 & -6 & 5 \end{array}

Then

\Sigma(x-\bar{x})^{2}=6^{2}+(-3)^{2}+0^{2}+(-7)^{2}+2^{2}+(-4)^{2}+0^{2}+7^{2}+(-6)^{2}+5^{2}=224

so

s^{2}=\frac{\Sigma(x-\bar{x})^{2}}{n-1}=\frac{224}{9}=24 . \overline{8}

and

s=\sqrt{24 . \overline{8}} \approx 4.99

The student is encouraged to compute the ten deviations for Data Set I and verify that their squares add up to 20 , so that the sample variance and standard deviation of Data Set I are the much smaller numbers s^{2}=20 / 9=2 . \overline{2} and s=\sqrt{20 / 9} \approx 1.49.


EXAMPLE 12

Find the sample variance and the sample standard deviation of the ten GPAs in Note 2.12 "Example 3" in Section 2.2 "Measures of Central Location".


Solution:

Since

\Sigma x=1.90+3.00+2.53+3.71+2.12+1.76+2.71+1.39+4.00+3.33=26.45

and

\begin{aligned} \Sigma x^{2} &=1.90^{2}+3.00^{2}+2.53^{2}+3.71^{2}+2.12^{2}+1.76^{2} \\ &=76.7321 \end{aligned}

the shortcut formula gives

s^{2}=\frac{\Sigma x^{2}-\frac{1}{n}(\Sigma x)^{2}}{n-1}=\frac{76.7321-\frac{(26.45)^{2}}{10}}{10-1}=\frac{6.77185}{9}=.75242 \overline{7}

and

s=\sqrt{.75242 \overline{7}} \approx .867

The sample variance has different units from the data. For example, if the units in the data set were inches, the new units would be inches squared, or square inches. It is thus primarily of theoretical importance and will not be considered further in this text, except in passing.

If the data set comprises the whole population, then the population standard deviation, denoted \sigma (the lower case Greek letter sigma), and its square, the population variance \sigma^{2}, are defined as follows.


Definition

The population variance a n d population standard deviation of a set of N population data are the numbers \sigma^{2} and \sigma defined by the formulas

\sigma^{2}=\frac{\Sigma(x-\mu)^{2}}{N} and \sigma=\sqrt{\frac{\Sigma(x-\mu)^{2}}{N}}

Note that the denominator in the fraction is the full number of observations, not that number reduced by one, as is the case with the sample standard deviation. Since most data sets are samples, we will always work with the sample standard deviation and variance.

Finally, in many real-life situations the most important statistical issues have to do with comparing the means and standard deviations of two data sets. Figure 2.11 "Difference between Two Data Sets" illustrates how a difference in one or both of the sample mean and the sample standard deviation are reflected in the appearance of the data set as shown by the curves derived from the relative frequency histograms built using the data.

Figure 2.11 Difference between Two Data Sets