The Sampling Distribution of a Sample Mean

Site: Saylor Academy
Course: MA121: Introduction to Statistics
Book: The Sampling Distribution of a Sample Mean
Printed by: Guest user
Date: Tuesday, September 10, 2024, 4:30 AM

Description

First, this section discusses the mean and variance of the sampling distribution of the mean. It also shows how central limit theorem can help to approximate the corresponding sampling distributions. Then, it talks about the properties of the sampling distribution for differences between means by giving the formulas of both mean and variance for the sampling distribution. Using the central limit theorem, it also talks about how to compute the probability of a difference between means being beyond a specified value.

Sampling Distribution of the Mean

Learning Objectives

  1. State the mean and variance of the sampling distribution of the mean
  2. Compute the standard error of the mean
  3. State the central limit theorem

The sampling distribution of the mean was defined in the section introducing sampling distributions. This section reviews some important properties of the sampling distribution of the mean introduced in the demonstrations in this chapter.


Mean

The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean \mu, then the mean of the sampling distribution of the mean is also \mu. The symbol \mu_{\mathrm{M}} is used to refer to the mean of the sampling distribution of the mean. Therefore, the formula for the mean of the sampling distribution of the mean can be written as:

\mu_{M}=\mu


Variance

The variance of the sampling distribution of the mean is computed as follows:

\sigma_{M}^{2}=\frac{\sigma^{2}}{N}

That is, the variance of the sampling distribution of the mean is the population variance divided by \mathrm{N}, the sample size (the number of scores used to compute a mean). Thus, the larger the sample size, the smaller the variance of the sampling distribution of the mean.

(optional) This expression can be derived very easily from the variance sum law. Let's begin by computing the variance of the sampling distribution of the sum of three numbers sampled from a population with variance \sigma^{2}. The variance of the sum would be \sigma^{2}+\sigma^{2}+\sigma^{2}. For N numbers, the variance would be N \sigma^{2}. Since the mean is 1 / N times the sum, the variance of the sampling distribution of the mean would be 1 / \mathrm{N}^{2} times the variance of the sum, which equals \sigma^{2} / N.

The standard error of the mean is the standard deviation of the sampling distribution of the mean. It is therefore the square root of the variance of the sampling distribution of the mean and can be written as:

\sigma_{M}=\frac{\sigma}{\sqrt{N}}

The standard error is represented by a \sigma because it is a standard deviation. The subscript (M) indicates that the standard error in question is the standard error of the mean.


Central Limit Theorem

The central limit theorem states that:

Given a population with a finite mean \mu and a finite non-zero variance \sigma^{2}, the sampling distribution of the mean approaches a normal distribution with a mean of \mu and a variance of \sigma^{2} / N as N, the sample size, increases.

The expressions for the mean and variance of the sampling distribution of the mean are not new or remarkable. What is remarkable is that regardless of the shape of the parent population, the sampling distribution of the mean approaches a normal distribution as \mathrm{N} increases. If you have used the "Central Limit Theorem Demo," you have already seen this for yourself. As a reminder, Figure 1 shows the results of the simulation for \mathrm{N}=2 and \mathrm{N}=10. The parent population was a uniform distribution. You can see that the distribution for \mathrm{N}=2 is far from a normal distribution. Nonetheless, it does show that the scores are denser in the middle than in the tails. For \mathrm{N}=10 the distribution is quite close to a normal distribution. Notice that the means of the two distributions are the same, but that the spread of the distribution for N=10 is smaller.


Figure 1. A simulation of a sampling distribution. The parent population is uniform. The blue line under "16" indicates that 16 is the mean. The red line extends from the mean plus and minus one standard deviation.

Figure 2 shows how closely the sampling distribution of the mean approximates a normal distribution even when the parent population is very non-normal. If you look closely you can see that the sampling distributions do have a slight positive skew. The larger the sample size, the closer the sampling distribution of the mean would be to a normal distribution.


Figure 2. A simulation of a sampling distribution. The parent population is very non-normal.


Source: David M. Lane, https://onlinestatbook.com/2/sampling_distributions/samp_dist_mean.html
Public Domain Mark This work is in the Public Domain.

Video

 

 

Questions

Question 1 out of 5.
The population has a mean of 14 and a standard deviation of 3. The sample size of your sampling distribution is N=10. What is the mean of the sampling distribution of the mean?


Question 2 out of 5.
The population has a mean of 30 and a standard deviation of 6. The sample size of your sampling distribution is N=9. What is the variance of the sampling distribution of the mean?


Question 3 out of 5.
The population has a mean of 120 and a standard deviation of 12. The sample size of your sampling distribution is N=16. What is the standard error of the mean?


Question 4 out of 5.
The sampling distribution of the mean, with N=30, of a moderately negatively skewed distribution is:

Positively skewed

Negatively skewed

About normal


Question 5 out of 5.
The entire student body of 225 students took a test. These test scores have a mean of 75, a standard deviation of 10, and are slightly positively skewed. If you randomly chose 25 of these test scores and calculated the mean over and over again, what could be the mean, standard deviation, and skew of this distribution?

Mean = 75, SD = 10, Skew = 1.2

Mean = 75, SD = 0.67, Skew = 0.8

Mean = 80, SD = 2, Skew = -1.2

Mean = 75, SD = 2, Skew = about 0

Mean = 75, SD = 0.67, Skew = about 0

Answers

  1. The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled, in this case 14.

  2. The variance of the sampling distribution of the mean is the population variance divided by N. The population SD is 6, so the population variance is 36. 36/9 = 4

  3. The standard error is the standard deviation of the population divided by the square root of N. In this case, 12/4 = 3

  4. According to the central limit theorem, regardless of the shape of the parent population, the sampling distribution of the mean approaches a normal distribution as N increases. In this case, a sample size of 30 is sufficiently large to cause the sampling distribution of the mean to look about normal.

  5. Mean = 75, SD = 2, Skew = about 0: This problem is asking about the sampling distribution of the mean: Mean = 75, SD = 10/sqrt(25) = 10/5 = 2, Skew = about 0 because the central limit theorem states that the sampling distribution of the mean would be about normal with a large enough N.

Sampling Distribution of Difference Between Means

Learning Objectives

  1. State the mean and variance of the sampling distribution of the difference between means
  2. Compute the standard error of the difference between means
  3. Compute the probability of a difference between means being above a specified value

Statistical analyses are very often concerned with the difference between means. A typical example is an experiment designed to compare the mean of a control group with the mean of an experimental group. Inferential statistics used in the analysis of this type of experiment depend on the sampling distribution of the difference between means.

The sampling distribution of the difference between means can be thought of as the distribution that would result if we repeated the following three steps over and over again: (1) sample n_{1} scores from Population 1 and n_{2} scores from Population 2,(2) compute the means of the two samples ( M_{1} and M_{2} ), and (3) compute the difference between means, M_{1}-M_{2}. The distribution of the differences between means is the sampling distribution of the difference between means.

As you might expect, the mean of the sampling distribution of the difference between means is:

\mu_{M_{1}-M_{2}}=\mu_{1}-\mu_{2}


which says that the mean of the distribution of differences between sample means is equal to the difference between population means. For example, say that the mean test score of all 12 -year-olds in a population is 34 and the mean of 10 -yearolds is 25. If numerous samples were taken from each age group and the mean difference computed each time, the mean of these numerous differences between sample means would be 34-25=9.

From the variance sum law, we know that:

\sigma_{M_{1}-M_{2}}^{2}=\sigma_{M_{1}}^{2}+\sigma_{M_{2}}^{2}

which says that the variance of the sampling distribution of the difference between means is equal to the variance of the sampling distribution of the mean for Population 1 plus the variance of the sampling distribution of the mean for Population 2. Recall the formula for the variance of the sampling distribution of the mean:

\sigma_{M}^{2}=\frac{\sigma^{2}}{N}

Since we have two populations and two samples sizes, we need to distinguish between the two variances and sample sizes. We do this by using the subscripts 1 and 2. Using this convention, we can write the formula for the variance of the sampling distribution of the difference between means as:

\sigma_{M_{1}-M_{2}}^{2}=\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}

Since the standard error of a sampling distribution is the standard deviation of the sampling distribution, the standard error of the difference between means is:

\sigma_{M_{1}-M_{2}}=\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}

Just to review the notation, the symbol on the left contains a sigma (\sigma), which means it is a standard deviation. The subscripts M_{1}-M_{2} indicate that it is the standard deviation of the sampling distribution of \mathrm{M}_{1}-\mathrm{M}_{2}.

Now let's look at an application of this formula. Assume there are two species of green beings on Mars. The mean height of Species 1 is 32 while the mean height of Species 2 is 22. The variances of the two species are 60 and 70, respectively and the heights of both species are normally distributed. You randomly sample 10 members of Species 1 and 14 members of Species 2. What is the probability that the mean of the 10 members of Species 1 will exceed the mean of the 14 members of Species 2 by 5 or more? Without doing any calculations, you probably know that the probability is pretty high since the difference in population means is 10. But what exactly is the probability?

First, let's determine the sampling distribution of the difference between means. Using the formulas above, the mean is

\mu_{M_{1}-M_{2}}=32-22=10

The standard error is:

\sigma_{M_{1}-M_{2}}=\sqrt{\frac{60}{10}+\frac{70}{14}}=3.317

The sampling distribution is shown in Figure 1 . Notice that it is normally distributed with a mean of 10 and a standard deviation of 3.317. The area above 5 is shaded blue.


Figure 1. The sampling distribution of the difference between means.

The last step is to determine the area that is shaded blue. Using either a Z table or the normal calculator, the area can be determined to be 0.934. Thus the probability that the mean of the sample from Species 1 will exceed the mean of the sample from Species 2 by 5 or more is 0.934.

As shown below, the formula for the standard error of the difference between means is much simpler if the sample sizes and the population variances are equal. When the variances and samples sizes are the same, there is no need to use the subscripts 1 and 2 to differentiate these terms.

\sigma_{M_{1}-M_{2}}=\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}=\sqrt{\frac{\sigma^{2}}{n}+\frac{\sigma^{2}}{n}}=\sqrt{\frac{2 \sigma^{2}}{n}}

This simplified version of the formula can be used for the following problem: The mean height of 15 -year-old boys (in \mathrm{cm} ) is 175 and the variance is 64. For girls, the mean is 165 and the variance is 64. If eight boys and eight girls were sampled, what is the probability that the mean height of the sample of girls would be higher than the mean height of the sample of boys? In other words, what is the probability that the mean height of girls minus the mean height of boys is greater than 0?

As before, the problem can be solved in terms of the sampling distribution of the difference between means (girls - boys). The mean of the distribution is 165 175=-10. The standard deviation of the distribution is:

\sigma_{M_{-}-M_{2}}=\sqrt{\frac{2 \sigma^{2}}{n}}=\sqrt{\frac{(2)(64)}{8}}=4

A graph of the distribution is shown in Figure 2 . It is clear that it is unlikely that the mean height for girls would be higher than the mean height for boys since in the population boys are quite a bit taller. Nonetheless it is not inconceivable that the girls' mean could be higher than the boys' mean.


Figure 2. Sampling distribution of the difference between mean heights.

A difference between means of 0 or higher is a difference of 10 / 4=2.5 standard deviations above the mean of -10. The probability of a score 2.5 or more standard deviations above the mean is 0.0062.

Video

 

 

Questions

Question 1 out of 4.
Population 1 has a mean of 20 and a variance of 100. Population 2 has a mean of 15 and a variance of 64. You sample 20 scores from Pop 1 and 16 scores from Pop 2. What is the mean of the sampling distribution of the difference between means (Pop 1 - Pop 2)?


Question 2 out of 4.
Population 1 has a mean of 20 and a variance of 100. Population 2 has a mean of 15 and a variance of 64. You sample 20 scores from Pop 1 and 16 scores from Pop 2. What is the variance of the sampling distribution of the difference between means (Pop 1 - Pop 2)?

Question 3 out of 4.
The mean height of 15-year-old boys is 175 cm and the variance is 64. For girls, the mean is 165 and the variance is 64. If 8 boys and 8 girls were sampled, what is the probability that the mean height of the sample of boys would be at least 6 cm higher than the mean height of the sample of girls?


Question 4 out of 4.
The mean time to complete a task is 727 millisecond for 3rd graders and 532 milliseconds for 5th graders. The variances of the two grades are 12,000 for 3rd graders and 10,000 for 5th graders. The times for both grades are normally distributed. You randomly sample 12 3rd graders and 14 5th graders. What is the probability that the mean time of the 3rd graders will exceed the mean time of the 5th graders by 150 msec or more?

Answers

  1. The mean of the distribution of the difference between sample means is equal to the difference between population means. 20 - 15 = 5

  2. The variance of the sampling distribution of the difference between means is equal to the variance of the sampling distribution of the mean for Pop 1 plus the variance of the sampling distribution of the mean for Pop 2. 100/20 + 64/16 = 5 + 4 = 9

  3. Mean = 10, SD = 4, Plug these into the normal calculator and find the area above 6. You get. 841. A similar question using this data appears in the text.

  4. Mean = 727 - 532 = 195, Var = 12,000/12 + 10,000/14 = 1,714.3, SD = sqrt(1,714.3) = 41.404, Use the normal calculator to calculate the area above 150 for a distribution with this mean and SD. You get 0.8614.