BUS204 Study Guide

Unit 3: The Normal Distribution

3a. Define and apply the properties of the normal distribution while understanding real-world implications and applications of the normal distribution

  • What are the properties of a normal distribution?
  • Why do you think these distributions have the name "normal"?
  • We've stated before that the probability of any continuous random variable EQUALING x be effectively 0. Knowing the properties of the normal distribution, why does this have to be the case?

Every data point in a distribution has a corresponding Z-score, which signifies the number of standard deviations above (positive) or below (negative) the mean.

Note first that the normal distribution is not a single distribution, but a family of distributions with those properties. Every combination of mean and standard deviation is considered its own distribution.

A normal distribution is a continuous distribution with the following properties:

  • The distribution is symmetric and the histogram is bell-shaped.
  • The mean, median, and mode are all equal and at the center of the distribution and bell curve.
  • The location, or center, is defined by the mean, and the spread or variation is defined by the standard deviation. Each normal distribution is defined by its mean  \mu and standard deviation  \sigma .


The data is roughly distributed as follows:

  • 68% of all data points are within one standard deviation of the mean or have a Z-score between -1 and +1
  • 95% of all the data are within two standard deviations of the mean or have a Z-score between -2 and +2.
  • 99.7% of all the data are within three standard deviations of the mean or have a Z-score between -3 and +3.

These guidelines are known as the empirical rule.

By definition, the total area under a normal curve is 1. Finding the probability that a normal random variable  x is between two values (remember that a property of continuous distributions is that the probability of  x having any single value is effectively 0) is done by finding the area under the normal distribution curve between those two values. This is a more intuitive explanation of why the probability of a single value has to be zero.

Many real-life populations are normally distributed: heights of people, for example, or heights of other animals within their own species. In a large enough class, exam scores can be expected to follow a roughly normal distribution. Even some non-scientific data tends to normality:


As with any real-life case, a perfectly normal distribution is rare.

To review, see Qualitative Sense of Normal Distributions.

 

3b. Use the normal distribution to estimate the probability of an event occurring

  • What is the difference between a normal distribution and the standard normal distribution?
  • Why does the original distribution you want to find probabilities for first need to be converted to the standard normal before using a table to look up values?
  • Most standard normal distribution tables give the area to the left of a particular Z-score. How would you find the area to the right of that Z-score? What about between two Z-scores if you are only given an area to the left of each one?

Finding the probability that a normal random variable  x is between two values (remember that a property of continuous distributions is that the probability of  x having any single value is effectively 0) is done by finding the area under the normal distribution curve between those two values. 

Finding the area is done using a computer or tables. In order to use the tables, the probability distribution being considered must be converted to the standard normal distribution.

The standard normal distribution (now we can say "the") is the normal distribution with a mean  \mu=0 and  \sigma=1 .

Convert the interval of values you're seeking of the area between two Z-scores and then use either a table of standard normal values or a calculator or app to calculate the probability.

This process can be reverse-engineered to find the percentile of a distribution (What percent of values fall below that value?). Find the Z-score that would have that area to its right or left, then convert that Z-score to the original distribution.

To review, see The Central Limit Theorem.

 

3c. Explain how the normal distribution relates to the central limit theorem

  • What is the difference between the following problems: "Find the probability that  x falls between  a and  b " and "Find the probability that the mean of a sample taken from that distribution falls between  a and  b ".
  • If a population distribution is not normal, yet you want to find the probability involving the mean of a sample taken from that distribution, what must be true about your sample?

The sampling distribution of the mean has a mean and standard distribution of a sample of data drawn from a larger distribution. If you sample n items from a distribution repeatedly and record the sample means, those sample means themselves will be normally distributed with a mean equal to the population mean and standard deviation (also called the standard error) equal to the population standard deviation divided by the square root of n, provided that one of the two conditions exists:

  • The underlying population distribution is normal
  • The sample size is significantly large ("large" usually means greater than 30, but this can be flexible if the underlying distribution is close to normal).

The rule above is referred to as The Central Limit Theorem. If you want to find the probability that the mean of a sample falls between two values, you convert those values to Z-scores (but make sure you convert the standard deviation  \sigma to the standard error  \frac{\sigma}{\sqrt{n}} .

 {\mu_{\bar{x}}} \approx \mu

 \sigma_{\bar{x}} \approx \frac{\sigma}{\sqrt{n}}

To review, see The Central Limit Theorem and Sampling Distribution of the Sample Mean.

 

Unit 3 Vocabulary

This vocabulary list includes terms that might help you with the review items above and some terms you should be familiar with to be successful in completing the final exam for the course. 

Try to think of the reason why each term is included.

  • central limit theorem
  • empirical rule
  • mean
  • normal distribution
  • sampling distribution of the mean
  • standard deviation
  • standard error
  • standard normal distribution