The Mean, Standard Deviation, and Sampling Distribution of the Sample Mean

This section gives several concrete examples of calculating the exact distributions of the sample mean. The corresponding means and standard deviations are computed for demonstration based on these distributions. Next, it discusses sampling distributions of sample means when the sample size is large. It also considers the case when the population is normal. Finally, it uses the central limit theorem for large sample approximations.

Learning Objectives

1. To become familiar with the concept of the probability distribution of the sample mean.
2. To understand the meaning of the formulas for the mean and standard deviation of the sample mean.

Suppose we wish to estimate the mean $\mu$ of a population. In actual practice we would typically take just one sample. Imagine however that we take sample after sample, all of the same size $n$, and compute the sample mean $\bar{x}$ of each one. We will likely get a different value of $\bar{x}$ each time. The sample mean $\bar{x}$ is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. We will write $\bar{X}$ when the sample mean is thought of as a random variable, and write $\bar{x}$ for the values that it takes. The random variable $\bar{X}$ has a mean, denoted $\mu_{X}$, and a standard deviation, denoted $\sigma_{X}$. Here is an example with such a small population and small sample size that we can actually write down every single sample.

Example 1

A rowing team consists of four rowers who weigh $152,156,160$, and $164$ pounds. Find all possible random samples with replacement of size two and compute the sample mean for each one. Use them to find the probability distribution, the mean, and the standard deviation of the sample mean $\bar{X}$.

Solution

The following table shows all possible samples with replacement of size two, along with the mean of each:

$\begin{array}{|c|c|c|c|c|c|c|c|}\hline \text { Sample } & \text { Mean } & \text { Sample } & \text { Mean } & \text { Sample } & \text { Mean } & \text { Sample } & \text { Mean } \\\hline 152,152 & 152 & 156,152 & 154 & 160,152 & 156 & 164,152 & 158 \\\hline 152,156 & 154 & 156,156 & 156 & 160,156 & 158 & 164,156 & 160 \\\hline 152,160 & 156 & 156,160 & 158 & 160,160 & 160 & 164,160 & 162 \\\hline 152,164 & 158 & 156,164 & 160 & 160,164 & 162 & 164,164 & 164 \\\hline\end{array}$

The table shows that there are seven possible values of the sample mean $\bar{X}$. The value $\bar{x}=152$ happens only one way (the rower weighing $152$ pounds must be selected both times), as does the value $\bar{x}=164$, but the other values happen more than one way, hence are more likely to be observec than $152$ and $164$ are. Since the $16$ samples are equally likely, we obtain the probability distribution of the sample mean just by counting:

$\begin{array}{c|ccccccc}\bar{x} & 152 & 154 & 156 & 158 & 160 & 162 & 164 \\\hline P(\bar{x}) & \frac{1}{16} & \frac{2}{16} & \frac{3}{16} & \frac{4}{16} & \frac{3}{16} & \frac{2}{16} & \frac{1}{16}\end{array}$

Now we apply the formulas from Section 4.2.2 "The Mean and Standard Deviation of a Discrete Random Variable" in Chapter 4 "Discrete Random Variables" for the mean and standard deviation of a discrete random variable to $\bar{X}$. For $\mu_{X}$ we obtain.

\begin{aligned}\mu_{\bar{X}} &=\Sigma \bar{x} P(\bar{x}) \\&=152\left(\frac{1}{16}\right)+154\left(\frac{2}{16}\right)+156\left(\frac{3}{16}\right)+158\left(\frac{4}{16}\right)+160\left(\frac{3}{16}\right)+162\left(\frac{2}{16}\right)+164\left(\frac{1}{16}\right) \\&=158\end{aligned}

For $\sigma \bar{X}$ we first compute $\Sigma \bar{x}^{2} P(\bar{x})$:

$152^{2}\left(\frac{1}{16}\right)+154^{2}\left(\frac{2}{16}\right)+156^{2}\left(\frac{3}{16}\right)+158^{2}\left(\frac{4}{16}\right)+160^{2}\left(\frac{3}{16}\right)+162^{2}\left(\frac{2}{16}\right)+1$

which is $24,974$, so that

$\sigma_{\bar{X}}=\sqrt{\Sigma \bar{x}^{2} P(\bar{x})-\mu_{\bar{x}}^{2}}=\sqrt{24,974-158^{2}}=\sqrt{10}$

The mean and standard deviation of the population $\{152,156,160,164\}$ in the example are $\mu=158$ and $\sigma=\sqrt{20}$. The mean of the sample mean $\bar{X}$ that we have just computed is exactly the mean of the population. The standard deviation of the sample mean $\bar{X}$ that we have just computed is the standard deviation of the population divided by the square root of the sample size: $\sqrt{10}=\sqrt{20} / \sqrt{2}$. These relationships are not coincidences, but are illustrations of the following formulas.

Suppose random samples of size $n$ are drawn from a population with mean $\mu$ and standard deviation $\sigma$. The mean $\mu_{X}$ and standard deviation $\sigma_{X}$ of the sample mean $\bar{X}$ satisfy

$\mu_{\bar{X}}=\mu \quad \text { and } \quad \sigma_{\bar{X}}=\frac{\sigma}{\sqrt{n}}$

The first formula says that if we could take every possible sample from the population and compute the corresponding sample mean, then those numbers would center at the number we wish to estimate, the population mean $\mu$.

The second formula says that averages computed from samples vary less than individual measurements on the population do, and quantifies the relationship.

Example 2

The mean and standard deviation of the tax value of all vehicles registered in a certain state are $\mu=\ 13,525$ and $\sigma=\ 4,180$. Suppose random samples of size $100$ are drawn from the population of vehicles. What are the mean $\mu_{\bar{X}}$ and standard deviation $\sigma_{\bar{X}}$ of the sample mean $\bar{X}$?

Solution

Since $n=100$, the formulas yield

$\mu_{\bar{X}}=\mu=\ 13,525 \quad \text { and } \quad \sigma_{\bar{X}}=\frac{\sigma}{\sqrt{n}}=\frac{\ 4180}{\sqrt{100}}=\ 418$