This section elaborates on mean, median, and mode at the population level and sample level. This section also contains many interesting examples of range, variance, and standard deviation. Complete the exercises and check your answers.
Measures of Variability
The Variance and the Standard Deviation
The other two measures of variability that we will consider are more elaborate and also depend on whether the data set is just a sample drawn from a much larger population or is the whole population itself (that is, a census).
Definition
The sample variance of a set of sample data is the number
defined by the formula
which by algebra is equivalent to the formula
The sample standard deviation of a set of sample data is the square root of the sample variance, hence is the number s given by the formulas
Although the first formula in each case looks less complicated than the second, the latter is easier to use in hand computations, and is called a shortcut formula.
EXAMPLE 11
Find the sample variance and the sample standard deviation of Data Set II in Table 2.1 "Two Data Sets".
Solution:
To use the defining formula (the first formula) in the definition we first compute for each observation its deviation
from the sample mean. Since the mean of the data is
, we obtain the ten numbers displayed in the second line of the supplied table.
Then
so
and
The student is encouraged to compute the ten deviations for Data Set I and verify that their squares add up to 20 , so that the sample variance and standard deviation of Data Set I are the much smaller numbers and
.
EXAMPLE 12
Find the sample variance and the sample standard deviation of the ten GPAs in Note 2.12 "Example 3" in Section 2.2 "Measures of Central Location".
Solution:
Since
and
the shortcut formula gives
and
The sample variance has different units from the data. For example, if the units in the data set were inches, the new units would be inches squared, or square inches. It is thus primarily of theoretical importance and will not be considered further in this text, except in passing.
If the data set comprises the whole population, then the population standard deviation, denoted (the lower case Greek letter sigma), and its square, the population variance
, are defined as follows.
Definition
The population variance population standard deviation of a set of
population data are the numbers
and
defined by the formulas
Note that the denominator in the fraction is the full number of observations, not that number reduced by one, as is the case with the sample standard deviation. Since most data sets are samples, we will always work with the sample standard deviation and variance.
Finally, in many real-life situations the most important statistical issues have to do with comparing the means and standard deviations of two data sets. Figure 2.11 "Difference between Two Data Sets" illustrates how a difference in one or both of the sample mean and the sample standard deviation are reflected in the appearance of the data set as shown by the curves derived from the relative frequency histograms built using the data.
Figure 2.11 Difference between Two Data Sets