MA121 Study Guide


Unit 2: Elements of Probability and Random Variables

2a. Apply simple principles of probability, and use common terminology of probability

  • Define probability. Explain how to compute probability.
  • Explain the concept of equally likely outcomes.
  • Define a random variable. How is it different from variables you may have encountered in Algebra? Name the types of random variables.

Think about probability as the chance that an outcome or event will occur. The probability of something occurring is a number ranging between zero (zero percent or no chance) to one (100 percent or definitely).

We always express probability as a decimal for use in calculations. In other words, we write 55 percent as p = 0.55. If you have several events with equally likely outcomes then the number of possible outcomes is the denominator and the number of "successful" outcomes is the numerator.

For example, the probability of rolling greater than four on a six-sided die is 2/6, because rolling a five and six are the successes out of six outcomes. We cannot extend this to the sum of two dice. Totals of 2 through 12 make eleven possible events, but not all of them are equally likely. There is only one way {1,1} to roll a 2, but six ways {1,6}, {2,5}, {3,4}, {4,3}, {5,2}, {6,1} to roll a 7.

There are many variations of the probability formula for different situations and distributions, but we generally calculate probability as the number of "favorable" outcomes (that is, outcomes you are looking for) divided by the total number of possible outcomes.

A random variable differs from variables you have seen in algebra, because in those courses, the value of x is fixed and we solve its value by following certain steps.

For example, for the equation 2x = 4, x always equals two; it is just a matter of finding it. The study of probability introduces the concept of random variables: their value results from a probability experiment.

If x equals the number of times a coin lands on heads when you flip a coin five times, x will have a value from 0 to 5. We do not know the specific value until we have tossed the coin five times.

A random variable can be discrete (a specific set of possible values) or continuous (many or an infinite number of possible values). In our coin toss example above, x is a discrete random variable, since x can only have six possible values (0 to 5).

A continuous random variable might be a team's score during a basketball game. Just as we discussed above, there are too many possible values to make a frequency table for each possible point value.

Review this material in Remarks on the Concept of "Probability" and Random Variables.


2b. Calculate conditional probability, and determine whether two events are mutually exclusive and whether two events are independent

  • Define an outcome and an event. How are the two related?
  • What does it mean for two events to be dependent or independent?
  • What does it mean for two events to be mutually exclusive?
  • Define conditional probability.
  • Define a compound event.

An outcome is the result of a single experiment. Examples include flipping a coin, measuring someone's height, or asking someone to name their favorite baseball team.

Two events are independent when the occurrence of one event or action does not depend on the occurrence of another.

  • An example of independent events are: A = roll a five on a die, and B = flip heads on a coin. The two events do not have a causal relationship.
  • An example of dependent events are: A = the temperature is below freezing, and B = it snows. In this case, event B is more likely to happen if A occurs than if it does not.

Compound events are combinations of events; we can calculate the probabilities for them as well. Common conjunctions are "AND", "OR", and "GIVEN" (conditional probability). In our above example, "Below freezing AND snowfall" is an example of the compound event (A and B).

Mutually-exclusive events (also called disjoint events) are events that cannot occur at the same time.

  • An example of mutually-exclusive events are: A = roll greater than a four on a six-sided die, and B = roll less than three on a six-sided die. These events cannot occur at the same time.

Symbolically, we describe this as  P(A \cap B) = 0

We pronounce the notation  P(A | B) = 0 as "A given B" which means we are looking for the probability that A occurs given that B occurs, or has occurred. We call this situation conditional probability.

Review this material in Remarks on the Concept of "Probability" and Basic Concepts.


2c. Calculate probabilities using the addition rules and multiplication rules

  • Define the general addition rule of probability. How does it relate to whether events are mutually exclusive, or not?
  • Define the multiplication rule of probability. How does it relate to whether events are independent, or not?
  • Define the special addition rule, general multiplication rule, and special multiplication rule.

We generally associate the general addition rule of probability with "or" compound events. We associate the multiplication rule with "and" compound events.

The probability p(A or B) = p(A) + p(B) holds if A and B are mutually exclusive (cannot both occur).

If A and B are not mutually exclusive, the special addition rule holds: p(A or B) = p(A) + p(B) − p(A & B).

The general multiplication rule is p(A & B) = p(A) × p(B) and holds if A and B are independent.

If they are dependent events, that's when conditional probability comes in and we use the special multiplication rule: p(A & B) = p(A) × p(B | A).

Review this material in:


2d. Construct and interpret Venn diagrams

  • Define a union or an intersection of events.
  • Explain how to use Venn diagrams to illustrate outcomes and events.

The probability of a union of events  p(A\cup B) is the same as either A or B (or both) happening.

The probability of an intersection of events  p(A\cap B) is the same as saying that A and B both happen. We can represent these on a Venn diagram as events (circles) where outcomes are points in each circle. A union would be pictured as both circles shaded in, where an intersection would be represented as only the common area being shaded in.

Review this material in Probability with Playing Cards and Venn Diagrams and Addition Rule for Probability.


2e. Apply useful counting rules in the context of combinatorial probability

  • Define and explain the difference between a combination and a permutation.

A combination and permutation refer to the number of possible ways that x out of a possible n outcomes can occur. The difference is that the order does not matter in a combination, but order does matter in a permutation.

For example, there are ten ways to choose x = 2 out of the first n = 5 letters of the alphabet: AB, AC, AD, AE, BC, BD, BE, CD, CE, DE. If order does not matter, ten combinations are possible. You could reverse the order of any of the letters and it would not matter.

There would be 20 permutations if AB and BA were considered different. Without listing them all here, you can intuitively see this because you have ten combinations and each combination can be in two different orders (AB or BA), so 10 × 2 = 20 permutations.

Review this material in Permutations and Combinations.


2f. Identify and use common discrete probability distribution functions

  • Define a probability distribution. How does a probability distribution relate to the frequency tables we reviewed in Unit 1?
  • What is the difference between a discrete and continuous probability distribution?

A probability distribution consists of each possible value (or interval of values) of a random variable, and the probability that the variable will take on that value.

Probability distributions have many implications in decision making. When you interpret data, you will need to know the probability distribution the value you are trying to estimate follows. They are related to the frequency tables in that the probabilities are equivalent to what we call the relative frequency distribution.

If you have a list of data, you can get the probabilities on the right side of the table by dividing the frequency by the total number of data points.

  • For example, if the frequency of x = 3 is 7 in 20 die rolls, then the probability of rolling a three is 7/20 = 0.35, and so 0.35 would go across from value 3 in the table.

The difference between discrete and continuous probability distributions is analogous to the difference between discrete and continuous variables.

A discrete distribution (like a die roll) has a fixed, finite set of possible values for the random variable x, while a continuous distribution has many or infinitely many possible values for x. Like a frequency distribution, the values of x must be grouped into intervals. The same rules apply as those that apply to relative frequency histograms: all intervals must be equal width, non-overlapping, and all-inclusive.

Review this material in:


2g. Calculate and interpret expected values

  • What is the expected value of a distribution and how is it related to the mean of a set of data?

The expected value of a distribution is another name for the mean of the distribution.

This means that if you take a large set of numbers which follows the original distribution, the arithmetic mean (sum divided by n) of those numbers should roughly equal the expected value of the distribution.

We calculate the value of a distribution by multiplying each value of x by its probability (the median of the interval if it is continuous) and then summing up those numbers.

Review this material in Probability Distributions for Discrete Random Variables.


2h. Identify the binomial probability distribution, and apply it appropriately

  • What is a binomial experiment?
  • What is a binomial distribution? What characteristics do a set of events have to have to follow a binomial distribution?

A binomial experiment is an random experiment that has exactly two possible values (quantitative or qualitative) for x:

Flipping a coin: x = {heads, tails}

Answer to a true-false question: x = {true, false}

Free throw in basketball: x = {made, not made}

A binomial distribution is the distribution of the discrete random variable x, where x represents the number of successes out of n possible events. Note that we mean success in a generic sense: success may not be something positive. "Success" means the characteristic you are looking for or researching, regardless of whether you consider the outcome to be good or bad.

In our basketball free-throw example, if a player takes 10 free-throw shots, they will be successful zero to 10 times. Possible values for x include {0, 1, 2, 3, 4, 5, 6, 7, 8, 10}.

These values and their associated probabilities comprise a binomial probability distribution, which must have three criteria:

  1. All experiments are binomial.
  2. All experiments will succeed or fail independently of each other (that is, all free throws are independent events).
  3. All experiments have an equal probability of success.

A binomial distribution is a family of distributions where each specific distribution consists of two parameters: n = the number of experiments, p = the probability of success per experiment.

For our basketball example, if the player hits their free throws 75% of the time, we would define this distribution as binomial with n = 10 and p = 0.75. There are an infinite number of possible binomial distributions, with each combination of n & p making a unique distribution. 

TIP: Any specific distribution within a family of distributions is defined by its parameter(s). For example, binomial distributions have parameters n & p.

We can calculate the expected value of this distribution by multiplying n and p, or n × p.

Review this material in The Binomial Distribution and Binomial Distribution.


2i. Identify the Poisson probability distribution, and apply it appropriately

  • Define a Poisson distribution. When can you use it to estimate a binomial distribution? 
  • Describe a second general use for Poisson distribution In addition to approximating a binomial distribution.
  • A binomial distribution with n = 20 and p = 0.4 gives the same lambda (or mean), and the same Poisson distribution as a binomial distribution with n = 2000 and p = 0.004. How can the same Poisson distribution be "equivalent" to multiple binomial distributions?

The Poisson probability family of distributions (pronounced pwah-SOHN) has two main applications:

  1. The Poisson distribution is related to the binomial distribution because you can use it to approximate the binomial distribution when n is a large value and p is a small value. Unlike the binomial distribution, the Poisson distribution is easier to calculate because it has only one parameter (represented by the Greek letter lambda λ) and the expected value. The calculation is n * p, just as for binomial distribution.

  2. Statisticians refer to the Poisson distribution as the distribution of rare events. Suppose 1,000 cars drive on a section of road every day, and 1.2 of them get into an accident on average. Since the mean is so small compared to n, we can model this using a Poisson distribution with  \lambda =1.2 . We can use the formula to find the probability of x = 1 accident, x = 2 accidents, and so on.

    A minor difference between the Poisson distribution and binomial distribution is that the binomial distribution is a discrete distribution, where all possible values of x are between zero and n. The Poisson distribution is discrete in that x only has a fixed set of values: in theory it can have ANY whole number value from zero to infinity.

    For our car example, any probability for x greater than, let's say five, is so remote that it is effectively zero. Theoretically we could calculate the probability p(x = 150).

    How would you calculate Lambda? x = the expected number of occurrences during a fixed time period. So if a toll booth averages 45 cars per hour, and your fixed time period is 10 minutes, then the value of λ would be the expected number in 10 minutes, so 45 per 60 minutes would be 7.5. Note that although they are discrete, the expected value for the binomial and Poisson distributions do not have to be a whole number.

Finally, for each of the two distributions in the third question above lambda  \lambda =8 . However, you should see that finding the probability p(x = 7) gives the same answer for two different distributions. The Poisson distribution is a more accurate approximation for the second distribution than for the first. The Poisson distribution is a better predictor of the binomial distribution, the larger n gets (and the smaller p gets).

Review Poisson distribution in Poisson Distribution and Poisson Process 1 | Probability and Statistics | Khan Academy.


2j. Identify and use continuous probability density functions

  • What is the difference, in general, between computing probabilities from discrete vs. continuous distributions? 
  • Explain why one rule of continuous probability distributions is that there is zero probability that the random variable x equals any particular value.

In a discrete distribution, we can calculate the probability p(x = x) that x equals a particular value from a formula, depending on the distribution. We can also find probability in a range of values p(1 < x < 5) by computing p(x = 1) through p(x = 5) and adding the numbers. 

The difference between discrete and continuous distributions is that the probability p(x = x) that x equals a particular value is effectively zero. We can compute p(a < x < b) by finding the area under the density curve between x = a and x = b. It doesn't depend on the height of the graph.

This is why statisticians often call discrete distributions probability distribution functions and continuous distributions probability density functions. We use the term "density" because probabilities are based on the area between the two numbers, not the height of the graph. A probability density function, by definition, has an area of 1 (that is, it is unitless) under the entire curve.

We can explain this rationale in two ways. The most obvious reason is that p(x = x) is a single line which has no area if we calculate the probability that x is in a range of values by computing the area under the curve. The second reason is more conceptual. Since a continuous distribution has an infinite possible values of x, if we are talking about a uniform distribution between x = 0 and x = 1, an infinite number of numbers exist between those two values. Based on the definition of probability, since there are infinite possible outcomes, 1/∞ tends to 0.

Review this material in Continuous Random Variables.


2k. Identify the normal probability distribution, and apply it appropriately

  • Describe the characteristics of a normal distribution. What sets it apart from any other bell-shaped distribution?
  • What does it mean to be symmetric? Describe a uniform distribution.
  • In general, how do we calculate probabilities based on a normal distribution?
  • Define a standard normal distribution and what sets it apart from other normal distributions.

As we have discussed, probability distributions can be discrete or continuous. A continuous distribution can be symmetric if the density curve is symmetric around the median, which is also the mean.

A uniform distribution has a flat density curve. Another classic example of this is the discrete distribution where x = the sum of two six-sided die. There is only one way each to roll a 2 or a 12, but the median x = 7 has six possible ways: {1, 6}, {2, 5}, {3, 4}, {4, 3}, {5, 2}, and {6, 1}. The distribution has higher probabilities toward the mean/median and lower probabilities toward the edges.

A bell-shaped distribution has a bell-shaped density curve, like the dice distribution except continuous and graphically represented by a smooth curve.

Further in the hierarchy, we have the normal distributions which has all the characteristics above but with a few additional tell-tale characteristics, which we refer to as the empirical rule:

  1. The probability of x having a value between 1 standard deviation under to 1 standard deviation over the mean is about 68 percent.
  2. The probability of x being between −2 and +2 standard deviations is about 95 percent.
  3. The probability of x being between −3 and +3 standard deviations is about 99.7 percent.

There is an important reason why we PLURALIZE "normal distributions" above. As we said earlier, we define distributions by their parameters, such as n & p for binomial distributions. A given combination of mean  \mu and standard deviation  \sigma makes a particular normal distribution.

Finally, the standard normal distribution is a normal distribution with mean = 0 and standard deviation = 1. We will need to obtain a standard normal distribution (often referred to as the Z distribution) to calculate probabilities involving all normal distributions.

To find the probability of x being between a and b in a normal distribution, we must take the following steps:

  1. Convert the endpoint(s) into Z scores using the formula  Z=\frac{x-\mu}{\sigma}
  2. Use technology or a Z distribution table to look up the area to the left of b and the area to the left of a and subtract the two values.
  3. For p(x < a) convert a into a Z score and find the area left of that value.
  4. For p(x > b) convert b into a Z score, find the area left of that value and then subtract that number from 1.

In summary, the hierarchy is:

  1. Probability distribution
  2. Continuous probability distribution
  3. Symmetric
  4. Normal
  5. Standard normal (though discrete distributions can also be symmetric).

Review this material in:


Unit 2 Vocabulary

  • Addition (general and special) rules of probability
  • Binomial distribution
  • Binomial experiment
  • Combination
  • Compound event
  • Conditional probability
  • Continuous probability distribution and continuous random variable
  • Dependent and independent events
  • Discrete distribution and discrete random variable
  • Empirical rule
  • Event
  • Expected value of a distribution
  • Intersection
  • Multiplication (general and special) rules of probability
  • Mutually exclusive events
  • Normal distribution
  • Outcome
  • Parameters
  • Permutation
  • Poisson distribution
  • Probability density function
  • Probability distribution
  • Probability distribution function
  • Relative frequency distribution
  • Standard normal distribution
  • Symmetric distribution
  • Uniform distribution
  • Union
  • Venn diagram