The Central Limit Theorem

Site: Saylor Academy
Course: BUS204: Business Statistics
Book: The Central Limit Theorem
Printed by: Guest user
Date: Monday, May 20, 2024, 5:27 AM

Description

Read this chapter, which covers some of the most important concepts used in statistics: the central limit theorem and the normal distribution. Be sure to attempt the practice problems and homework at the end of the chapter, which will give you a chance to check your understanding of these concepts.

Introduction


Figure 7.1 If you want to figure out the distribution of the change people carry in their pockets, using the Central Limit Th

Figure 7.1 If you want to figure out the distribution of the change people carry in their pockets, using the Central Limit Theorem and assuming your sample is large enough, you will find that the distribution is the normal probability density function.

Why are we so concerned with means? Two reasons are: they give us a middle ground for comparison, and they are easy to calculate. In this chapter, you will study means and the Central Limit Theorem.

The Central Limit Theorem is one of the most powerful and useful ideas in all of statistics. The Central Limit Theorem is a theorem which means that it is NOT a theory or just somebody's idea of the way things work. As a theorem it ranks with the Pythagorean Theorem, or the theorem that tells us that the sum of the angles of a triangle must add to 180. These are facts of the ways of the world rigorously demonstrated with mathematical precision and logic. As we will see this powerful theorem will determine just what we can, and cannot say, in inferential statistics. The Central Limit Theorem is concerned with drawing finite samples of size n from a population with a known mean, μ, and a known standard deviation, σ. The conclusion is that if we collect samples of size n with a "large enough n," calculate each sample's mean, and create a histogram (distribution) of those means, then the resulting distribution will tend to have an approximate normal distribution.

The astounding result is that it does not matter what the distribution of the original population is, or whether you even need to know it. The important fact is that the distribution of sample means tend to follow the normal distribution.

The size of the sample, n, that is required in order to be "large enough" depends on the original population from which the samples are drawn (the sample size should be at least 30 or the data should come from a normal distribution). If the original population is far from normal, then more observations are needed for the sample means. Sampling is done randomly and with replacement in the theoretical model.

Source: OpenStax, https://openstax.org/books/introductory-business-statistics/pages/7-introduction
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

The Central Limit Theorem for Sample Means

The sampling distribution is a theoretical distribution. It is created by taking many samples of size n from a population. Each sample mean is then treated like a single observation of this new distribution, the sampling distribution. The genius of thinking this way is that it recognizes that when we sample we are creating an observation and that observation must come from some particular distribution. The Central Limit Theorem answers the question: from what distribution did a sample mean come? If this is discovered, then we can treat a sample mean just like any other observation and calculate probabilities about what values it might take on. We have effectively moved from the world of statistics where we know only what we have from the sample, to the world of probability where we know the distribution from which the sample mean came and the parameters of that distribution.

The reasons that one samples a population are obvious. The time and expense of checking every invoice to determine its validity or every shipment to see if it contains all the items may well exceed the cost of errors in billing or shipping. For some products, sampling would require destroying them, called destructive sampling. One such example is measuring the ability of a metal to withstand saltwater corrosion for parts on ocean going vessels.

Sampling thus raises an important question; just which sample was drawn. Even if the sample were randomly drawn, there are theoretically an almost infinite number of samples. With just 100 items, there are more than 75 million unique samples of size five that can be drawn. If six are in the sample, the number of possible samples increases to just more than one billion. Of the 75 million possible samples, then, which one did you get? If there is variation in the items to be sampled, there will be variation in the samples. One could draw an "unlucky" sample and make very wrong conclusions concerning the population. This recognition that any sample we draw is really only one from a distribution of samples provides us with what is probably the single most important theorem is statistics: the Central Limit Theorem. Without the Central Limit Theorem it would be impossible to proceed to inferential statistics from simple probability theory. In its most basic form, the Central Limit Theorem states that regardless of the underlying probability density function of the population data, the theoretical distribution of the means of samples from the population will be normally distributed. In essence, this says that the mean of a sample should be treated like an observation drawn from a normal distribution. The Central Limit Theorem only holds if the sample size is "large enough" which has been shown to be only 30 observations or more.

Figure 7.2 graphically displays this very important proposition.

Figure 7.2

Figure 7.2

Notice that the horizontal axis in the top panel is labeled X. These are the individual observations of the population. This is the unknown distribution of the population values. The graph is purposefully drawn all squiggly to show that it does not matter just how odd ball it really is. Remember, we will never know what this distribution looks like, or its mean or standard deviation for that matter.

The horizontal axis in the bottom panel is labeled \overline X's. This is the theoretical distribution called the sampling distribution of the means. Each observation on this distribution is a sample mean. All these sample means were calculated from individual samples with the same sample size. The theoretical sampling distribution contains all of the sample mean values from all the possible samples that could have been taken from the population. Of course, no one would ever actually take all of these samples, but if they did this is how they would look. And the Central Limit Theorem says that they will be normally distributed.

The Central Limit Theorem goes even further and tells us the mean and standard deviation of this theoretical distribution.

Parameter Population distribution Sample Sampling distribution of \overline X's
Mean μ \overline X
μ_{\overline x} and E(μ_{\overline x})=μ
Standard deviation σ
s
σ _{\overline x} = \dfrac{σ}{\sqrt{n}}

Table 7.1

The practical significance of The Central Limit Theorem is that now we can compute probabilities for drawing a sample mean, \overline X, in just the same way as we did for drawing specific observations, X's, when we knew the population mean and standard deviation and that the population data were normally distributed.. The standardizing formula has to be amended to recognize that the mean and standard deviation of the sampling distribution, sometimes, called the standard error of the mean, are different from those of the population distribution, but otherwise nothing has changed. The new standardizing formula is

Z=\dfrac{\overline X −μ_{\overline X}}{σ_{\overline X}}=\dfrac{\overline X −μ}{\dfrac{σ}{\sqrt n}}

Notice that μ_{\overline X} in the first formula has been changed to simply µ in the second version. The reason is that mathematically it can be shown that the expected value of μ_{\overline X} is equal to µ. This was stated in Table 7.1 above. Mathematically, the E(x) symbol read the "expected value of x". This formula will be used in the next unit to provide estimates of the unknown population parameter μ.

Using the Central Limit Theorem

Examples of the Central Limit Theorem

Law of Large Numbers

The law of large numbers says that if you take samples of larger and larger size from any population, then the mean of the sampling distribution, μ_{\overline x} tends to get closer and closer to the true population mean, μ. From the Central Limit Theorem, we know that as n gets larger and larger, the sample means follow a normal distribution. The larger n gets, the smaller the standard deviation of the sampling distribution gets. (Remember that the standard deviation for the sampling distribution of \overline X is \dfrac{σ}{\sqrt n}) This means that the sample mean \overline x must be closer to the population mean μ as n increases. We can say that μ is the value that the sample means approach as n gets larger. The Central Limit Theorem illustrates the law of large numbers.

This concept is so important and plays such a critical role in what follows it deserves to be developed further. Indeed, there are two critical issues that flow from the Central Limit Theorem and the application of the Law of Large numbers to it. These are

  1. The probability density function of the sampling distribution of means is normally distributed regardless of the underlying distribution of the population observations and
  2. standard deviation of the sampling distribution decreases as the size of the samples that were used to calculate the means for the sampling distribution increases.

Taking these in order. It would seem counterintuitive that the population may have any distribution and the distribution of means coming from it would be normally distributed. With the use of computers, experiments can be simulated that show the process by which the sampling distribution changes as the sample size is increased. These simulations show visually the results of the mathematical proof of the Central Limit Theorem.

Here are three examples of very different population distributions and the evolution of the sampling distribution to a normal distribution as the sample size increases. The top panel in these cases represents the histogram for the original data. The three panels show the histograms for 1,000 randomly drawn samples for different sample sizes: n=10, n= 25 and n=50. As the sample size increases, and the number of samples taken remains constant, the distribution of the 1,000 sample means becomes closer to the smooth line that represents the normal distribution.

Figure 7.3 is for a normal distribution of individual observations and we would expect the sampling distribution to converge on the normal quickly. The results show this and show that even at a very small sample size the distribution is close to the normal distribution.

Figure 7.3
Figure 7.3

Figure 7.4 is a uniform distribution which, a bit amazingly, quickly approached the normal distribution even with only a sample of 10.

Figure 7.4
Figure 7.4

Figure 7.5 is a skewed distribution. This last one could be an exponential, geometric, or binomial with a small probability of success creating the skew in the distribution. For skewed distributions our intuition would say that this will take larger sample sizes to move to a normal distribution and indeed that is what we observe from the simulation. Nevertheless, at a sample size of 50, not considered a very large sample, the distribution of sample means has very decidedly gained the shape of the normal distribution.

Figure 7.5


Figure 7.5

The Central Limit Theorem provides more than the proof that the sampling distribution of means is normally distributed. It also provides us with the mean and standard deviation of this distribution. Further, as discussed above, the expected value of the mean, μ_{\overline x}, is equal to the mean of the population of the original data which is what we are interested in estimating from the sample we took. We have already inserted this conclusion of the Central Limit Theorem into the formula we use for standardizing from the sampling distribution to the standard normal distribution. And finally, the Central Limit Theorem has also provided the standard deviation of the sampling distribution, σ_{\overline x}=\dfrac{σ}{\sqrt n}, and this is critical to have to calculate probabilities of values of the new random variable, \overline x.

Figure 7.6 shows a sampling distribution. The mean has been marked on the horizontal axis of the \overline x's and the standard deviation has been written to the right above the distribution. Notice that the standard deviation of the sampling distribution is the original standard deviation of the population, divided by the sample size. We have already seen that as the sample size increases the sampling distribution becomes closer and closer to the normal distribution. As this happens, the standard deviation of the sampling distribution changes in another way; the standard deviation decreases as n increases. At very large n, the standard deviation of the sampling distribution becomes very small and at infinity it collapses on top of the population mean. This is what it means that the expected value of µ_{\overline x} is the population mean, µ.

Figure 7.6
Figure 7.6

At non-extreme values of n,this relationship between the standard deviation of the sampling distribution and the sample size plays a very important part in our ability to estimate the parameters we are interested in.

Figure 7.7 shows three sampling distributions. The only change that was made is the sample size that was used to get the sample means for each distribution. As the sample size increases, n goes from 10 to 30 to 50, the standard deviations of the respective sampling distributions decrease because the sample size is in the denominator of the standard deviations of the sampling distributions.

Figure 7.7
Figure 7.7

The implications for this are very important. Figure 7.8 shows the effect of the sample size on the confidence we will have in our estimates. These are two sampling distributions from the same population. One sampling distribution was created with samples of size 10 and the other with samples of size 50. All other things constant, the sampling distribution with sample size 50 has a smaller standard deviation that causes the graph to be higher and narrower. The important effect of this is that for the same probability of one standard deviation from the mean, this distribution covers much less of a range of possible values than the other distribution. One standard deviation is marked on the \overline X axis for each distribution. This is shown by the two arrows that are plus or minus one standard deviation for each distribution. If the probability that the true mean is one standard deviation away from the mean, then for the sampling distribution with the smaller sample size, the possible range of values is much greater. A simple question is, would you rather have a sample mean from the narrow, tight distribution, or the flat, wide distribution as the estimate of the population mean? Your answer tells us why people intuitively will always choose data from a large sample rather than a small sample. The sample mean they are getting is coming from a more compact distribution. This concept will be the foundation for what will be called level of confidence in the next unit.

Figure 7.8

Figure 7.8

The Central Limit Theorem for Proportions

The Central Limit Theorem tells us that the point estimate for the sample mean, \overline x, comes from a normal distribution of \overline x's. This theoretical distribution is called the sampling distribution of \overline x's. We now investigate the sampling distribution for another important parameter we wish to estimate; p from the binomial probability density function.

If the random variable is discrete, such as for categorical data, then the parameter we wish to estimate is the population proportion. This is, of course, the probability of drawing a success in any one random draw. Unlike the case just discussed for a continuous random variable where we did not know the population distribution of X's, here we actually know the underlying probability density function for these data; it is the binomial. The random variable is X = the number of successes and the parameter we wish to know is p, the probability of drawing a success which is of course the proportion of successes in the population. The question at issue is: from what distribution was the sample proportion, p'=\dfrac{x}{n} drawn? The sample size is n and X is the number of successes found in that sample. This is a parallel question that was just answered by the Central Limit Theorem: from what distribution was the sample mean, \overline x, drawn? We saw that once we knew that the distribution was the Normal distribution then we were able to create confidence intervals for the population parameter, µ. We will also use this same information to test hypotheses about the population mean later. We wish now to be able to develop confidence intervals for the population parameter "p" from the binomial probability density function.

In order to find the distribution from which sample proportions come we need to develop the sampling distribution of sample proportions just as we did for sample means. So again imagine that we randomly sample say 50 people and ask them if they support the new school bond issue. From this we find a sample proportion, p', and graph it on the axis of p's. We do this again and again etc., etc. until we have the theoretical distribution of p's. Some sample proportions will show high favorability toward the bond issue and others will show low favorability because random sampling will reflect the variation of views within the population. What we have done can be seen in Figure 7.9. The top panel is the population distributions of probabilities for each possible value of the random variable X. While we do not know what the specific distribution looks like because we do not know p, the population parameter, we do know that it must look something like this. In reality, we do not know either the mean or the standard deviation of this population distribution, the same difficulty we faced when analyzing the X's previously.

Figure 7.9

Figure 7.9

Figure 7.9 places the mean on the distribution of population probabilities as µ=np but of course we do not actually know the population mean because we do not know the population probability of success, p. Below the distribution of the population values is the sampling distribution of p's. Again the Central Limit Theorem tells us that this distribution is normally distributed just like the case of the sampling distribution for \overline x's. This sampling distribution also has a mean, the mean of the p's, and a standard deviation, σ_{p'}.

Importantly, in the case of the analysis of the distribution of sample means, the Central Limit Theorem told us the expected value of the mean of the sample means in the sampling distribution, and the standard deviation of the sampling distribution. Again the Central Limit Theorem provides this information for the sampling distribution for proportions. The answers are:

  1. The expected value of the mean of sampling distribution of sample proportions, µ_{p'}, is the population proportion, p.
  2. The standard deviation of the sampling distribution of sample proportions, σ_{p'}, is the population standard deviation divided by the square root of the sample size, n.

Both these conclusions are the same as we found for the sampling distribution for sample means. However in this case, because the mean and standard deviation of the binomial distribution both rely upon p, the formula for the standard deviation of the sampling distribution requires algebraic manipulation to be useful. We will take that up in the next chapter. The proof of these important conclusions from the Central Limit Theorem is provided below.

E(p')=E(\dfrac{x}{n})=(\dfrac{1}{n})E(x)=(\dfrac{1}{n})np=p

(The expected value of X, E(x), is simply the mean of the binomial distribution which we know to be np).

σ_{p'}^2=Var(p')=Var(\dfrac{x}{n})=\dfrac{1}{n^2}(Var(x))=\dfrac{1}{n^2}(np(1−p))=\dfrac{p(1−p)}{n}

The standard deviation of the sampling distribution for proportions is thus:

σ_{p'}=\sqrt{\dfrac{p(1−P)}{n}}

Parameter 
Population distribution     Sample     Sampling distribution of p's
Mean
µ = np p'=\dfrac{x}{n} p' and E(p') = p
Standard Deviation σ=\sqrt{npq} σ_{p'}=\sqrt{\dfrac{p(1−p)}{n}}

Table 7.2

Table 7.2 summarizes these results and shows the relationship between the population, sample and sampling distribution. Notice the parallel between this Table and Table 7.1 for the case where the random variable is continuous and we were developing the sampling distribution for means.

Reviewing the formula for the standard deviation of the sampling distribution for proportions we see that as n increases the standard deviation decreases. This is the same observation we made for the standard deviation for the sampling distribution for means. Again, as the sample size increases, the point estimate for either µ or p is found to come from a distribution with a narrower and narrower distribution. We concluded that with a given level of probability, the range from which the point estimate comes is smaller as the sample size, n, increases. Figure 7.8 shows this result for the case of sample means. Simply substitute p' for \overline x and we can see the impact of the sample size on the estimate of the sample proportion.

Finite Population Correction Factor

We saw that the sample size has an important effect on the variance and thus the standard deviation of the sampling distribution. Also of interest is the proportion of the total population that has been sampled. We have assumed that the population is extremely large and that we have sampled a small part of the population. As the population becomes smaller and we sample a larger number of observations the sample observations are not independent of each other. To correct for the impact of this, the Finite Correction Factor can be used to adjust the variance of the sampling distribution. It is appropriate when more than 5% of the population is being sampled and the population has a known population size. There are cases when the population is known, and therefore the correction factor must be applied. The issue arises for both the sampling distribution of the means and the sampling distribution of proportions. The Finite Population Correction Factor for the variance of the means shown in the standardizing formula is:

Z=\dfrac{\overline x−µ}{\dfrac{σ}{\sqrt n} ⋅ \sqrt{\dfrac{N−n}{N−1}}}

and for the variance of proportions is:

σ_{p'}=\sqrt{\dfrac{p(1−p)}{n}} \times \sqrt{\dfrac{N−n}{N−1}}

The following examples show how to apply the factor. Sampling variances get adjusted using the above formula.


Example 7.1

It is learned that the population of White German Shepherds in the USA is 4,000 dogs, and the mean weight for German Shepherds is 75.45 pounds. It is also learned that the population standard deviation is 10.37 pounds.

Problem
If the sample size is 100 dogs, then find the probability that a sample will have a mean that differs from the true probability mean by less than 2 pounds.

Solution 1
N=4000, n=100, σ=10.37, µ=75.45, (\overline x−µ)=±2

Z=\dfrac{\overline x−µ}{\dfrac{σ}{\sqrt n} ⋅ \sqrt{\dfrac{N−n}{N−1}}}=\dfrac{±2}{\dfrac{10.37}{\sqrt{100}}  ⋅ \sqrt{\dfrac{4000−100}{4000−1}}} = ±1.95

f(Z)=0.4744 ⋅ 2=0.9488

Note that "differs by less" references the area on both sides of the mean within 2 pounds right or left.


Example 7.2

When a customer places an order with Rudy's On-Line Office Supplies, a computerized accounting information system (AIS) automatically checks to see if the customer has exceeded his or her credit limit. Past records indicate that the probability of customers exceeding their credit limit is .06.

Problem
Suppose that on a given day, 3,000 orders are placed in total. If we randomly select 360 orders, what is the probability that between 10 and 20 customers will exceed their credit limit?

Solution 1
N=3000, n=360, p=0.06

σ_{p'}=\sqrt{\dfrac{p(1−p)}{n}} × \sqrt{\dfrac{N−n}{N−1}}=\sqrt{\dfrac{0.06(1−0.06)}{360}} × \sqrt{\dfrac{3000−360}{3000−1}}=0.0117

p_1=\dfrac{10}{360}=0.0278, p_2=\dfrac{20}{360}=0.0556

Z=\dfrac{p'−p}{\sqrt{\dfrac{p(1−p)}{n}} ⋅ \sqrt{\dfrac{N−n}{N−1}}} = \dfrac{0.0278−0.06}{0.011744}=−2.74

Z=\dfrac{p'−p}{\sqrt{\dfrac{p(1−p)}{n}} ⋅ \sqrt{\dfrac{N−n}{N−1}}}=\dfrac{0.0556−0.06}{0.011744}=−0.38

p(\dfrac{0.0278−0.06}{0.011744}  < z < \dfrac{0.0556−0.06}{0.011744})=p(−2.74 < z < −0.38)=0.4969−0.1480=0.3489