More on Normal Distributions

Site: Saylor Academy
Course: MA121: Introduction to Statistics
Book: More on Normal Distributions
Printed by: Guest user
Date: Friday, March 29, 2024, 3:23 AM

Description

First, this section talks about the history of the normal distribution and the central limit theorem and the relation of normal distributions to errors. Then, it discusses how to compute the area under the normal curve. It then moves on to the normal distribution, the area under the standard normal curve, and how to translate from non-standard normal to standard normal. Finally, it addresses how to compute (cumulative) binomial probabilities using normal approximations.

In the chapter on probability, we saw that the binomial distribution could be used to solve problems such as "If a fair coin is flipped 100 times, what is the probability of getting 60 or more heads?" The probability of exactly x heads out of \mathrm{N} flips is computed using the formula:

P(x)=\frac{N !}{x !(N-x) !} \pi^{x}(1-\pi)^{N-x}

where x is the number of heads (60), N is the number of flips (100), and π is the probability of a head (0.5). Therefore, to solve this problem, you compute the probability of 60 heads, then the probability of 61 heads, 62 heads, etc., and add up all these probabilities. Imagine how long it must have taken to compute binomial probabilities before the advent of calculators and computers.

Abraham de Moivre, an 18th century statistician and consultant to gamblers, was often called upon to make these lengthy computations. de Moivre noted that when the number of events (coin flips) increased, the shape of the binomial distribution approached a very smooth curve. Binomial distributions for 2, 4, and 12 flips are shown in Figure 1.


Figure 1. Examples of binomial distributions. The heights of the blue bars represent the probabilities.

de Moivre reasoned that if he could find a mathematical expression for this curve, he would be able to solve problems such as finding the probability of 60 or more heads out of 100 coin flips much more easily. This is exactly what he did, and the curve he discovered is now called the "normal curve".


Figure 2. The normal approximation to the binomial distribution for 12 coin flips. The smooth curve is the normal distribution. Note how well it approximates the binomial probabilities represented by the heights of the blue lines.

The importance of the normal curve stems primarily from the fact that the distributions of many natural phenomena are at least approximately normally distributed. One of the first applications of the normal distribution was to the analysis of errors of measurement made in astronomical observations, errors that occurred because of imperfect instruments and imperfect observers. Galileo in the 17th century noted that these errors were symmetric and that small errors occurred more frequently than large errors. This led to several hypothesized distributions of errors, but it was not until the early 19th century that it was discovered that these errors followed a normal distribution. Independently, the mathematicians Adrain in 1808 and Gauss in 1809 developed the formula for the normal distribution and showed that errors were fit well by this distribution.

This same distribution had been discovered by Laplace in 1778 when he derived the extremely important central limit theorem, the topic of a later section of this chapter. Laplace showed that even if a distribution is not normally distributed, the means of repeated samples from the distribution would be very nearly normally distributed, and that the larger the sample size, the closer the distribution of means would be to a normal distribution.

Most statistical procedures for testing differences between means assume normal distributions. Because the distribution of means is very close to normal, these tests work well even if the original distribution is only roughly normal.

Quételet was the first to apply the normal distribution to human characteristics. He noted that characteristics such as height, weight, and strength were normally distributed.



Source: David M. Lane, https://onlinestatbook.com/2/normal_distribution/history_normal.html
Public Domain Mark This work is in the Public Domain.

 

 

Question 1 out of 3.

Who was the 18th century statistician and consultant to gamblers that discovered the normal curve?

 de Moivre

 Galileo

 Adrian

 Gauss

Question 2 out of 3.

Why was the normal curve an important development?

 It has a relatively simple formula.

 Many natural phenomena are at least approximately normally distributed.

 Many inferential statistics can only be computed with a normal distribution.

Question 3 out of 3.

Who is responsible for the central limit theorem?

 Gauss

 Laplace

 Newton

 Adrian


  1. de Moivre reasoned that if he could find a mathematical expression for the smooth curve that came about when the number of binomial events (coin flips) increased, he would be able to solve problems such as finding the probability of 60 or more heads out of 100 coin flips much more easily. The curve he discovered is now called the "normal curve".

  2. For example, one of the first applications of the normal distribution was to the analysis of errors of measurement made in astronomical observations. Galileo in the 17th century noted that these errors were symmetric and that small errors occurred more frequently than large errors.

  3. Laplace derived the central limit theorem. Adrian and Gauss developed the formula for the normal distribution.

Learning Objectives

  1. State the proportion of a normal distribution within 1 and within 2 standard deviations of the mean
  2. Use the calculator "Calculate Area for a given X"
  3. Use the calculator "Calculate X for a given Area"

Areas under portions of a normal distribution can be computed by using calculus. Since this is a non-mathematical treatment of statistics, we will rely on computer programs and tables to determine these areas. Figure 1 shows a normal distribution with a mean of 50 and a standard deviation of 10. The shaded area between 40 and 60 contains 68% of the distribution.


Figure 1. Normal distribution with a mean of 50 and standard deviation of 10. 68% of the area is within one standard deviation (10) of the mean (50).

Figure 2 shows a normal distribution with a mean of 100 and a standard deviation of 20. As in Figure 1, 68% of the distribution is within one standard deviation of the mean.


Figure 2. Normal distribution with a mean of 100 and standard deviation of 20. 68% of the area is within one standard deviation (20) of the mean (100).

The normal distributions shown in Figures 1 and 2 are specific examples of the general rule that 68% of the area of any normal distribution is within one standard deviation of the mean.

Figure 3 shows a normal distribution with a mean of 75 and a standard deviation of 10. The shaded area contains 95% of the area and extends from 55.4 to 94.6. For all normal distributions, 95% of the area is within 1.96 standard deviations of the mean. For quick approximations, it is sometimes useful to round off and use 2 rather than 1.96 as the number of standard deviations you need to extend from the mean so as to include 95% of the area.


Figure 3. A normal distribution with a mean of 75 and a standard deviation of 10. 95% of the area is within 1.96 standard deviations of the mean.

The normal calculator can be used to calculate areas under the normal distribution. For example, you can use it to find the proportion of a normal distribution with a mean of 90 and a standard deviation of 12 that is above 110. Set the mean to 90 and the standard deviation to 12. Then enter "110" in the box to the right of the radio button "Above". At the bottom of the display you will see that the shaded area is 0.0478. See if you can use the calculator to find that the area between 115 and 120 is 0.0124.


Figure 4. Display from calculator showing the area above 110.

Say you wanted to find the score corresponding to the 75th percentile of a normal distribution with a mean of 90 and a standard deviation of 12. Using the inverse normal calculator, you enter the parameters as shown in Figure 5 and find that the area below 98.09 is 0.75.


Figure 5. Display from normal calculator showing that the 75th percentile is 98.09.

 

 

Question 1 out of 6.
A distribution has a mean of 40 and a standard deviation of 5. 68% of the distribution can be found between what two numbers?

 30 and 50

 0 and 45

 0 and 68

 35 and 45

Question 2 out of 6.
A distribution has a mean of 20 and a standard deviation of 3. Approximately 95% of the distribution can be found between what two numbers?

 17 and 23

 14 and 26

 10 and 30

 0 and 23

Question 3 out of 6.

A normal distribution has a mean of 5 and a standard deviation of 2. What proportion of the distribution is above 3?

Question 4 out of 6.

A normal distribution has a mean of 120 and a variance of 100. 35% of the area is below what number?

Question 5 out of 6.

A normal distribution of test scores has a mean of 38 and a standard deviation of 6. Everyone scoring at or above the 80th percentile gets placed in an advanced class. What is the cutoff score to get into the class?

Question 6 out of 6.

A normal distribution of test scores has a mean of 38 and a standard deviation of 6. What percent of the students scored between 30 and 45?


  1. 68% of the distribution is within one standard deviation of the mean. 40 + 5 = 45, 40 - 5 = 35

  2. 95% of the distribution is within 1.96 standard deviations of the mean. You can round 1.96 to 2 to approximate. 20 - 2(3) = 14, 20 + 2(3) = 26

  3. Use the "Calculate Area for a given X" calculator and enter Mean = 5, SD = 2, Above 3. You will get 0.8413.

  4. Var = 100, so SD = 10. Use the "Calculate X for a given Area" calculator and enter Mean = 120, SD = 10, Shaded area = .35. Click below, and you will get 116.15.

  5. Use the "Calculate X for a given Area" calculator and enter Mean = 38, SD = 6, Shaded area = .80. Click below, and you will get 43.05, meaning a score of 43.

  6. Use the "Calculate Area for a given X" calculator and enter Mean = 38, SD = 6, Between 30 and 45. You will get 0.787, meaning 78.7%.

Learning Objectives

  1. State the mean and standard deviation of the standard normal distribution
  2. Use a Z table
  3. Use the normal calculator
  4. Transform raw data to Z scores

As discussed in the introductory section, normal distributions do not necessarily have the same means and standard deviations. A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard normal distribution.

Areas of the normal distribution are often represented by tables of the standard normal distribution. A portion of a table of the standard normal distribution is shown in Table 1.

Table 1. A portion of a table of the standard normal distribution.

Z Area below
-2.5 0.0062
-2.49 0.0064
-2.48 0.0066
-2.47 0.0068
-2.46 0.0069
-2.45 0.0071
-2.44 0.0073
-2.43 0.0075
-2.42 0.0078
-2.41 0.008
-2.4 0.0082
-2.39 0.0084
-2.38 0.0087
-2.37 0.0089
-2.36 0.0091
-2.35 0.0094
-2.34 0.0096
-2.33 0.0099
-2.32 0.0102

The first column titled "Z" contains values of the standard normal distribution; the second column contains the area below Z. Since the distribution has a mean of 0 and a standard deviation of 1, the Z column is equal to the number of standard deviations below (or above) the mean. For example, a Z of -2.5 represents a value 2.5 standard deviations below the mean. The area below Z is 0.0062.

The same information can be obtained using the following Java applet. Figure 1 shows how it can be used to compute the area below a value of -2.5 on the standard normal distribution. Note that the mean is set to 0 and the standard deviation is set to 1.


Figure 1. An example from the applet.


Calculate Areas

A value from any normal distribution can be transformed into its corresponding value on a standard normal distribution using the following formula:

\begin{align*}
z=(x-\mu) / \sigma
\end{align*}

where Z is the value on the standard normal distribution, X is the value on the original distribution, \mu is the mean of the original distribution, and \sigma is the standard deviation of the original distribution.

As a simple application, what portion of a normal distribution with a mean of 50 and a standard deviation of 10 is below 26 ? Applying the formula, we obtain

\begin{align*}
z=(26-50) / 10=-2.4
\end{align*}

From Table 1, we can see that 0.0082 of the distribution is below -2.4. There is no need to transform to Z if you use the applet as shown in Figure 2 .


Figure 2. Area below 26 in a normal distribution with a mean of 50 and a standard deviation of 10.

If all the values in a distribution are transformed to Z scores, then the distribution will have a mean of 0 and a standard deviation of 1. This process of transforming a distribution to one with a mean of 0 and a standard deviation of 1 is called standardizing the distribution.

 

 

Question 1 out of 4.
A standard normal distribution has:

 a mean of 1 and a standard deviation of 1

 a mean of 0 and a standard deviation of 1

 a mean larger than its standard deviation

 all scores within one standard deviation of the mean

Question 2 out of 4.
A number 1.5 standard deviations below the mean has a z score of:

 1.5

 -1.5

 3

 more information is needed

Question 3 out of 4.
A distribution has a mean of 16 and a standard deviation of 6. What is the Z score that corresponds with 25?

Question 4 out of 4.
A distribution has a mean of 18 and a standard deviation of 5. Use the table presented in this section to determine the proportion of the scores (area) below 6.



  1. The standard normal distribution is defined as a normal distribution with a mean of 0 and a standard deviation of 1.

  2. Z is equal to the number of standard deviations below or above the mean. Numbers below the mean have negative Z scores.

  3. 25 is 1.5 SDs above the mean: Z = (X - M)/SD = (25 - 16)/6 = 1.5

  4. Z = (X - M)/SD = (6 - 18)/5 = -2.40, Look at the table to see that the area below -2.40 is .0082. (This answer can also be found using the Java applet instead of the table.)

Learning Objectives

  1. State the relationship between the normal distribution and the binomial distribution
  2. Use the normal distribution to approximate the binomial distribution
  3. State when the approximation is adequate

In the section on the history of the normal distribution, we saw that the normal distribution can be used to approximate the binomial distribution. This section shows how to compute these approximations.

Let's begin with an example. Assume you have a fair coin and wish to know the probability that you would get 8 heads out of 10 flips. The binomial distribution has a mean of \mu=N \Pi=(10)(0.5)=5 and a variance of \sigma^{2}=N \cap(1-\Pi) =(10)(0.5)(0.5)=2.5. The standard deviation is therefore 1.5811. A total of 8 heads is (8-5) / 1.5811=1.897 standard deviations above the mean of the distribution. The question then is, "What is the probability of getting a value exactly 1.897 standard deviations above the mean?" You may be surprised to learn that the answer is 0 : The probability of any one specific point is 0. The problem is that the binomial distribution is a discrete probability distribution, whereas the normal distribution is a continuous distribution.

The solution is to round off and consider any value from 7.5 to 8.5 to represent an outcome of 8 heads. Using this approach, we figure out the area under a normal curve from 7.5 to 8.5. The area in green in Figure 1 is an approximation of the probability of obtaining 8 heads.


Figure 1. Approximating the probability of 8 heads with the normal distribution.

The solution is therefore to compute this area. First we compute the area below 8.5 and then subtract the area below 7.5.

The results of using the normal area calculator to find the area below 8.5 are shown in Figure 2. The results for 7.5 are shown in Figure 3.


Figure 2. Area below 8.5.


Figure 3. Area below 7.5.

The difference between the areas is 0.044, which is the approximation of the binomial probability. For these parameters, the approximation is very accurate. The demonstration in the next section allows you to explore its accuracy with different parameters.

If you did not have the normal area calculator, you could find the solution using a table of the standard normal distribution (a Z table) as follows:

1. Find a Z score for 8.5 using the formula Z=(8.5-5) / 1.5811=2.21.

2. Find the area below a Z of 2.21=0.987.

3. Find a Z score for 7.5 using the formula Z=(7.5-5) / 1.5811=1.58.

4. Find the area below a Z of 1.58=0.943.

5. Subtract the value in step 4 from the value in step 2 to get 0.044.

The same logic applies when calculating the probability of a range of outcomes. For example, to calculate the probability of 8 to 10 flips, calculate the area from 7.5 to 10.5

The accuracy of the approximation depends on the values of N and п. A rule of thumb is that the approximation is good if both N \cap and N(1-п) are both greater than 10.

 

 

Question 1 out of 5.
Suppose you have a normal distribution with a mean of 6 and a standard deviation of 1. What is the probability of getting a Z score of exactly 1.2?

Question 2 out of 5.
You decide to use the normal distribution to approximate the binomial distribution. You want to know the probability of getting exactly 6 tails out of 10 flips. First you find the mean and SD of the normal distribution, and then you compute the area:

 at exactly 6

 from 5.5 to 6.5

 from 0 to 6

 from 6 to 10

Question 3 out of 5.
You decide to use the normal distribution to approximate the binomial distribution. You want to know the probability of getting from 7 to 13 heads out of 20 flips. You compute the area:

 from 7.5 to 13.5

 from 7.5 to 12.5

 from 7 to 13

 from 6.5 to 13

 from 6.5 to 13.5

Question 4 out of 5.
The normal approximation to the binomial is most accurate for which of the following probabilities?

 .2

 .5

 .8

Question 5 out of 5.
The normal approximation to the binomial is most accurate for which of the following sample sizes?

 4

 8

 12


  1. Because the normal distribution is continuous, the probability of any one specific point is 0.

  2. Because the normal distribution is continuous, the probability of any one specific point is 0. The solution is to round off and consider any value from 5.5 to 6.5 to represent an outcome of 6 tails. Using this approach, we figure out the area under a normal curve from 5.5 to 6.5.

  3. In order to include 7 flips, you need to start a little below it (6.5), and to include 13 flips, you need to go a little past it (13.5). So, you calculate the area from 6.5 to 13.5.

  4. It is most accurate for p=.5 because that makes the binomial distribution symmetric and closer to a normal distribution.

  5. The binomial distribution approaches a normal distribution as the sample size increases. Therefore the approximation is best when the sample size is highest.