Discrete Random Variables

Site: Saylor Academy
Course: BUS204: Business Statistics
Book: Discrete Random Variables
Printed by: Guest user
Date: Monday, May 20, 2024, 8:34 AM

Description

Read this chapter, which covers the basic rules of probability and the ways that randomness affects how probabilities are distributed. Be sure to attempt the practice problems and homework at the end of the chapter.

Introduction

Figure 4.1 You can use probability and discrete random variables to calculate the likelihood of lightning striking the ground

Figure 4.1 You can use probability and discrete random variables to calculate the likelihood of lightning striking the ground five times during a half-hour thunderstorm.

A student takes a ten-question, true-false quiz. Because the student had such a busy schedule, he or she could not study and guesses randomly at each answer. What is the probability of the student passing the test with at least a 70%?

Small companies might be interested in the number of long-distance phone calls their employees make during the peak time of the day. Suppose the historical average is 20 calls. What is the probability that the employees make more than 20 long-distance phone calls during the peak time?

These two examples illustrate two different types of probability problems involving discrete random variables. Recall that discrete data are data that you can count, that is, the random variable can only take on whole number values. A random variable describes the outcomes of a statistical experiment in words. The values of a random variable can vary with each repetition of an experiment, often called a trial.


Random Variable Notation

The upper case letter X denotes a random variable. Lower case letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.

For example, let X = the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT; THH; HTH; HHT; HTT; THT; TTH; HHH. Then, x = 0, 1, 2, 3. X is in words and x is a number. Notice that for this example, the x values are countable outcomes. Because you can count the possible values as whole numbers that X can take on and the outcomes are random (the x values 0, 1, 2, 3), X is a discrete random variable.


Probability Density Functions (PDF) for a Random Variable

A probability density function or probability distribution function has two characteristics:

  1. Each probability is between zero and one, inclusive.
  2. The sum of the probabilities is one.

A probability density function is a mathematical formula that calculates probabilities for specific types of events, what we have been calling experiments. There is a sort of magic to a probability density function (Pdf) partially because the same formula often describes very different types of events. For example, the binomial Pdf will calculate probabilities for flipping coins, yes/no questions on an exam, opinions of voters in an up or down opinion poll, indeed any binary event. Other probability density functions will provide probabilities for the time until a part will fail, when a customer will arrive at the turnpike booth, the number of telephone calls arriving at a central switchboard, the growth rate of a bacterium, and on and on. There are whole families of probability density functions that are used in a wide variety of applications, including medicine, business and finance, physics and engineering, among others.

For our needs here we will concentrate on only a few probability density functions as we develop the tools of inferential statistics.


Counting Formulas and the Combinational Formula

To repeat, the probability of event A , P(A), is simply the number of ways the experiment will result in A, relative to the total number of possible outcomes of the experiment.

As an equation this is:

P(A)=\dfrac{\text{number of ways to get A}}{\text{Total number of possible outcomes}}

When we looked at the sample space for flipping 3 coins we could easily write the full sample space and thus could easily count the number of events that met our desired result, e.g. x = 1 , where X is the random variable defined as the number of heads.

As we have larger numbers of items in the sample space, such as a full deck of 52 cards, the ability to write out the sample space becomes impossible.

We see that probabilities are nothing more than counting the events in each group we are interested in and dividing by the number of elements in the universe, or sample space. This is easy enough if we are counting sophomores in a Stat class, but in more complicated cases listing all the possible outcomes may take a life time. There are, for example, 36 possible outcomes from throwing just two six-sided dice where the random variable is the sum of the number of spots on the up-facing sides. If there were four dice then the total number of possible outcomes would become 1,296. There are more than 2.5 MILLION possible 5 card poker hands in a standard deck of 52 cards. Obviously keeping track of all these possibilities and counting them to get at a single probability would be tedious at best.

An alternative to listing the complete sample space and counting the number of elements we are interested in, is to skip the step of listing the sample space, and simply figuring out the number of elements in it and doing the appropriate division. If we are after a probability we really do not need to see each and every element in the sample space, we only need to know how many elements are there. Counting formulas were invented to do just this. They tell us the number of unordered subsets of a certain size that can be created from a set of unique elements. By unordered it is meant that, for example, when dealing cards, it does not matter if you got {ace, ace, ace, ace, king} or {king, ace, ace, ace, ace} or {ace, king, ace, ace, ace} and so on. Each of these subsets are the same because they each have 4 aces and one king.


Combinational Formula

\left(\begin{array}{l}n \\x\end{array}\right)={ }_n C_x=\frac{n !}{x !(n-x) !}

This is the formula that tells the number of unique unordered subsets of size x that can be created from n unique elements. The formula is read "n combinatorial x". Sometimes it is read as "n choose x". The exclamation point "!" is called a factorial and tells us to take all the numbers from 1 through the number before the ! and multiply them together thus 4! is 1·2·3·4=24. By definition 0! = 1. The formula is called the Combinatorial Formula. It is also called the Binomial Coefficient, for reasons that will be clear shortly. While this mathematical concept was understood long before 1653, Blaise Pascal is given major credit for his proof that he published in that year. Further, he developed a generalized method of calculating the values for combinatorials known to us as the Pascal Triangle. Pascal was one of the geniuses of an era of extraordinary intellectual advancement which included the work of Galileo, Rene Descartes, Isaac Newton, William Shakespeare and the refinement of the scientific method, the very rationale for the topic of this text.

Let's find the hard way the total number of combinations of the four aces in a deck of cards if we were going to take them two at a time. The sample space would be:

S={Spade,Heart),(Spade, Diamond),(Spade,Club), (Diamond,Club),(Heart,Diamond),(Heart,Club)}

There are 6 combinations; formally, six unique unordered subsets of size 2 that can be created from 4 unique elements. To use the combinatorial formula we would solve the formula as follows:

\left(\begin{array}{l}4 \\2\end{array}\right)=\frac{4 !}{(4-2) ! 2 !}=\frac{4 \cdot 3 \cdot 2 \cdot 1}{2 \cdot 1 \cdot 2 \cdot 1}=6

If we wanted to know the number of unique 5 card poker hands that could be created from a 52 card deck we simply compute

 \left(\begin{array}{l} 52 \\ 5\end{array}\right)

where 52 is the total number of unique elements from which we are drawing and 5 is the size group we are putting them into.

With the combinatorial formula we can count the number of elements in a sample space without having to write each one of them down, truly a lifetime's work for just the number of 5 card hands from a deck of 52 cards. We can now apply this tool to a very important probability density function, the hypergeometric distribution.

Remember, a probability density function computes probabilities for us. We simply put the appropriate numbers in the formula and we get the probability of specific events. However, for these formulas to work they must be applied only to cases for which they were designed.


Source: OpenStax, https://openstax.org/books/introductory-business-statistics/pages/4-introduction
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.

Hypergeometric Distribution

The simplest probability density function is the hypergeometric. This is the most basic one because it is created by combining our knowledge of probabilities from Venn diagrams, the addition and multiplication rules, and the combinatorial counting formula.

To find the number of ways to get 2 aces from the four in the deck we computed:

\left(\begin{array}{l}4 \\2\end{array}\right)=\frac{4 !}{2 !(4-2) !}=6

And if we did not care what else we had in our hand for the other three cards we would compute:

\left(\begin{array}{l}48 \\3\end{array}\right)=\frac{48 !}{3 !(45) ! }=17,296

Putting this together, we can compute the probability of getting exactly two aces in a 5 card poker hand as:

 \dfrac
{\left(\begin{array}{l}4 \\2\end{array}\right)
\left(\begin{array}{l}48 \\3\end{array}\right)}
{\left(\begin{array}{l}52 \\5\end{array}\right)}

=.0399

This solution is really just the probability distribution known as the Hypergeometric. The generalized formula is:

 h(x)=\dfrac{\left(\begin{array}{l} A\\ x \end{array}\right)\left(\begin{array}{l} N - A \\ n - x \end{array}\right)}{\left(\begin{array}{l} N \\ n \end{array}\right)}

where x = the number we are interested in coming from the group with A objects.

h(x) is the probability of x successes, in n attempts, when A successes (aces in this case) are in a population that contains N elements. The hypergeometric distribution is an example of a discrete probability distribution because there is no possibility of partial success, that is, there can be no poker hands with 2 1/2 aces. Said another way, a discrete random variable has to be a whole, or counting, number only. This probability distribution works in cases where the probability of a success changes with each draw. Another way of saying this is that the events are NOT independent. In using a deck of cards, we are sampling WITHOUT replacement. If we put each card back after it was drawn then the hypergeometric distribution be an inappropriate Pdf.

For the hypergeometric to work,

  1. the population must be dividable into two and only two independent subsets (aces and non-aces in our example). The random variable X = the number of items from the group of interest.
  2. the experiment must have changing probabilities of success with each experiment (the fact that cards are not replaced after the draw in our example makes this true in this case). Another way to say this is that you sample without replacement and therefore each pick is not independent.
  3. the random variable must be discrete, rather than continuous.


Example 4.1

Problem
A candy dish contains 30 jelly beans and 20 gumdrops. Ten candies are picked at random. What is the probability that 5 of the 10 are gumdrops? The two groups are jelly beans and gumdrops. Since the probability question asks for the probability of picking gumdrops, the group of interest (first group A in the formula) is gumdrops. The size of the group of interest (first group) is 30. The size of the second group is 20. The size of the sample is 10 (jelly beans or gumdrops). Let X = the number of gumdrops in the sample of 10. X takes on the values x = 0, 1, 2, ..., 10. a. What is the probability statement written mathematically? b. What is the hypergeometric probability density function written out to solve this problem? c. What is the answer to the question "What is the probability of drawing 5 gumdrops in 10 picks from the dish?"

Solution 1

a. P(x=5)
b. P(x=5)=\dfrac{\left(\begin{array}{l}30 \\5\end{array}\right)\left(\begin{array}{l}20 \\5\end{array}\right)}{\left(\begin{array}{l}50 \\10\end{array}\right)}
c. P(x=5)=0.215


Try It 4.1

A bag contains letter tiles. Forty-four of the tiles are vowels, and 56 are consonants. Seven tiles are picked at random. You want to know the probability that four of the seven tiles are vowels. What is the group of interest, the size of the group of interest, and the size of the sample?

Binomial Distribution

A more valuable probability density function with many applications is the binomial distribution. This distribution will compute probabilities for any binomial process. A binomial process, often called a Bernoulli process after the first person to fully develop its properties, is any case where there are only two possible outcomes in any one trial, called successes and failures. It gets its name from the binary number system where all numbers are reduced to either 1's or 0's, which is the basis for computer technology and CD music recordings.


Binomial Formula

b(x)= \left( \begin{array}{l} n\\ x \end{array} \right) p^xq^{n−x}

where b(x) is the probability of X successes in n trials when the probability of a success in ANY ONE TRIAL is p. And of course q=(1-p) and is the probability of a failure in any one trial.

We can see now why the combinatorial formula is also called the binomial coefficient because it reappears here again in the binomial probability function. For the binomial formula to work, the probability of a success in any one trial must be the same from trial to trial, or in other words, the outcomes of each trial must be independent. Flipping a coin is a binomial process because the probability of getting a head in one flip does not depend upon what has happened in PREVIOUS flips. (At this time it should be noted that using p for the parameter of the binomial distribution is a violation of the rule that population parameters are designated with Greek letters. In many textbooks θ (pronounced theta) is used instead of p and this is how it should be.

Just like a set of data, a probability density function has a mean and a standard deviation that describes the data set. For the binomial distribution these are given by the formulas:

μ=np
σ=\sqrt{npq}

Notice that p is the only parameter in these equations. The binomial distribution is thus seen as coming from the one-parameter family of probability distributions. In short, we know all there is to know about the binomial once we know p, the probability of a success in any one trial.

In probability theory, under certain circumstances, one probability distribution can be used to approximate another. We say that one is the limiting distribution of the other. If a small number is to be drawn from a large population, even if there is no replacement, we can still use the binomial even thought this is not a binomial process. If there is no replacement it violates the independence rule of the binomial. Nevertheless, we can use the binomial to approximate a probability that is really a hypergeometric distribution if we are drawing fewer than 10 percent of the population, i.e. n is less than 10 percent of N in the formula for the hypergeometric function. The rationale for this argument is that when drawing a small percentage of the population we do not alter the probability of a success from draw to draw in any meaningful way. Imagine drawing from not one deck of 52 cards but from 6 decks of cards. The probability of say drawing an ace does not change the conditional probability of what happens on a second draw in the same way it would if there were only 4 aces rather than the 24 aces now to draw from. This ability to use one probability distribution to estimate others will become very valuable to us later.

There are three characteristics of a binomial experiment.

  1. There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
  2. The random variable, x, number of successes, is discrete.
  3.     There are only two possible outcomes, called "success" and "failure," for each trial. The letter p denotes the probability of a success on any one trial, and q denotes the probability of a failure on any one trial. p + q = 1.
  4. The n trials are independent and are repeated using identical conditions. Think of this as drawing WITH replacement. Because the n trials are independent, the outcome of one trial does not help in predicting the outcome of another trial. Another way of saying this is that for each individual trial, the probability, p, of a success and probability, q, of a failure remain the same. For example, randomly guessing at a true-false statistics question has only two outcomes. If a success is guessing correctly, then a failure is guessing incorrectly. Suppose Joe always guesses correctly on any statistics true-false question with a probability p = 0.6. Then, q = 0.4. This means that for every true-false statistics question Joe answers, his probability of success (p = 0.6) and his probability of failure (q = 0.4) remain the same.

The outcomes of a binomial experiment fit a binomial probability distribution. The random variable X = the number of successes obtained in the n independent trials.

The mean, μ, and variance, σ2, for the binomial probability distribution are μ = np and σ^2 = npq. The standard deviation, σ, is then σ = \sqrt{npq}.

Any experiment that has characteristics three and four and where n = 1 is called a Bernoulli Trial (named after Jacob Bernoulli who, in the late 1600s, studied them extensively). A binomial experiment takes place when the number of successes is counted in one or more Bernoulli Trials.


Example 4.2

Suppose you play a game that you can only either win or lose. The probability that you win any game is 55%, and the probability that you lose is 45%. Each game you play is independent. If you play the game 20 times, write the function that describes the probability that you win 15 of the 20 times. Here, if you define X as the number of wins, then X takes on the values 0, 1, 2, 3, ..., 20. The probability of a success is p = 0.55. The probability of a failure is q = 0.45. The number of trials is n = 20. The probability question can be stated mathematically as P(x = 15).


Try It 4.2

A trainer is teaching a dolphin to do tricks. The probability that the dolphin successfully performs the trick is 35%, and the probability that the dolphin does not successfully perform the trick is 65%. Out of 20 attempts, you want to find the probability that the dolphin succeeds 12 times. Find the P(X=12) using the binomial Pdf.


Example 4.3

Problem
A fair coin is flipped 15 times. Each flip is independent. What is the probability of getting more than ten heads? Let X = the number of heads in 15 flips of the fair coin. X takes on the values 0, 1, 2, 3, ..., 15. Since the coin is fair, p = 0.5 and q = 0.5. The number of trials is n = 15. State the probability question mathematically.

Solution 1
P(x > 10)


Example 4.4

Approximately 70% of statistics students do their homework in time for it to be collected and graded. Each student does homework independently. In a statistics class of 50 students, what is the probability that at least 40 will do their homework on time? Students are selected randomly.

Problem
a. This is a binomial problem because there is only a success or a __________, there are a fixed number of trials, and the probability of a success is 0.70 for each trial.

Solution 1
a. failure

Problem
b. If we are interested in the number of students who do their homework on time, then how do we define X?

Solution 2
b. X = the number of statistics students who do their homework on time

Problem
c. What values does x take on?

Solution 3
c. 0, 1, 2, …, 50

Problem
d. What is a "failure," in words?

Solution 4
d. Failure is defined as a student who does not complete his or her homework on time.
The probability of a success is p = 0.70. The number of trials is n = 50.

Problem
e. If p + q = 1, then what is q?

Solution 5
e. q = 0.30

Problem
f. The words "at least" translate as what kind of inequality for the probability question P(x ____ 40).

Solution 6
f. greater than or equal to (≥)
The probability question is P(x ≥ 40).


Try It 4.4

Sixty-five percent of people pass the state driver's exam on the first try. A group of 50 individuals who have taken the driver's exam is randomly selected. Give two reasons why this is a binomial problem.


Try It 4.4

During the 2013 regular NBA season, DeAndre Jordan of the Los Angeles Clippers had the highest field goal completion rate in the league. DeAndre scored with 61.3% of his shots. Suppose you choose a random sample of 80 shots made by DeAndre during the 2013 season. Let X = the number of shots that scored points.

  1. What is the probability distribution for X?
  2. Using the formulas, calculate the (i) mean and (ii) standard deviation of X.
  3. Find the probability that DeAndre scored with 60 of these shots.
  4. Find the probability that DeAndre scored with more than 50 of these shots.

Geometric Distribution

The geometric probability density function builds upon what we have learned from the binomial distribution. In this case the experiment continues until either a success or a failure occurs rather than for a set number of trials. There are three main characteristics of a geometric experiment.

  1. There are one or more Bernoulli trials with all failures except the last one, which is a success. In other words, you keep repeating what you are doing until the first success. Then you stop. For example, you throw a dart at a bullseye until you hit the bullseye. The first time you hit the bullseye is a "success" so you stop throwing the dart. It might take six tries until you hit the bullseye. You can think of the trials as failure, failure, failure, failure, failure, success, STOP.
  2. In theory, the number of trials could go on forever.
  3. The probability, p, of a success and the probability, q, of a failure is the same for each trial. p + q = 1 and q = 1 − p. For example, the probability of rolling a three when you throw one fair die is \dfrac{1}{6}. This is true no matter how many times you roll the die. Suppose you want to know the probability of getting the first three on the fifth roll. On rolls one through four, you do not get a face with a three. The probability for each of the rolls is q = 56, the probability of a failure. The probability of getting a three on the fifth roll is (\dfrac{5}{6})(\dfrac{5}{6})(\dfrac{5}{6})(\dfrac{5}{6})(\dfrac{1}{6}) = 0.0804
  4. X = the number of independent trials until the first success.


Example 4.5

You play a game of chance that you can either win or lose (there are no other possibilities) until you lose. Your probability of losing is p = 0.57. What is the probability that it takes five games until you lose? Let X = the number of games you play until you lose (includes the losing game). Then X takes on the values 1, 2, 3, ... (could go on indefinitely). The probability question is P(x = 5).


Try It 4.5

You throw darts at a board until you hit the center area. Your probability of hitting the center area is p = 0.17. You want to find the probability that it takes eight throws until you hit the center. What values does X take on?


Example 4.6

A safety engineer feels that 35% of all industrial accidents in her plant are caused by failure of employees to follow instructions. She decides to look at the accident reports (selected randomly and replaced in the pile after reading) until she finds one that shows an accident caused by failure of employees to follow instructions. On average, how many reports would the safety engineer expect to look at until she finds a report showing an accident caused by employee failure to follow instructions? What is the probability that the safety engineer will have to examine at least three reports until she finds a report showing an accident caused by employee failure to follow instructions?

Let X = the number of accidents the safety engineer must examine until she finds a report showing an accident caused by employee failure to follow instructions. X takes on the values 1, 2, 3, .... The first question asks you to find the expected value or the mean. The second question asks you to find P(x ≥ 3). ("At least" translates to a "greater than or equal to" symbol).


Try It 4.6

An instructor feels that 15% of students get below a C on their final exam. She decides to look at final exams (selected randomly and replaced in the pile after reading) until she finds one that shows a grade below a C. We want to know the probability that the instructor will have to examine at least ten exams until she finds one with a grade below a C. What is the probability question stated mathematically?


Example 4.7

Suppose that you are looking for a student at your college who lives within five miles of you. You know that 55% of the 25,000 students do live within five miles of you. You randomly contact students from the college until one says he or she lives within five miles of you. What is the probability that you need to contact four people?

This is a geometric problem because you may have a number of failures before you have the one success you desire. Also, the probability of a success stays approximately the same each time you ask a student if he or she lives within five miles of you. There is no definite number of trials (number of times you ask a student).

Problem
a. Let X = the number of ____________ you must ask ____________ one says yes.

Solution 1
a. Let X = the number of students you must ask until one says yes.

Problem
b. What values does X take on?

Solution 2
b. 1, 2, 3, …, (total number of students)

Problem
c. What are p and q?

Solution 3
c. p = 0.55; q = 0.45

Problem
d. The probability question is P(_______).

Solution 4
d. P(x = 4)


Notation for the Geometric: G = Geometric Probability Distribution Function


X ~ G(p)

Read this as "X is a random variable with a geometric distribution". The parameter is p; p = the probability of a success for each trial.

The Geometric Pdf tells us the probability that the first occurrence of success requires x number of independent trials, each with success probability p. If the probability of success on each trial is p, then the probability that the xth trial (out of x trials) is the first success is:

P(X=x)=(1−p)^{x−1}p

for x = 1, 2, 3, ....
The expected value of X, the mean of this distribution, is 1/p. This tells us how many trials we have to expect until we get the first success including in the count the trial that results in success. The above form of the Geometric distribution is used for modeling the number of trials until the first success. The number of trials includes the one that is a success: x = all trials including the one that is a success. This can be seen in the form of the formula. If X = number of trials including the success, then we must multiply the probability of failure, (1-p), times the number of failures, that is X-1.

By contrast, the following form of the geometric distribution is used for modeling number of failures until the first success:

P(X=x)=(1−p)^xp

for x = 0, 1, 2, 3, ....
In this case the trial that is a success is not counted as a trial in the formula: x = number of failures. The expected value, mean, of this distribution is μ=\dfrac{(1−p)}{p}. This tells us how many failures to expect before we have a success. In either case, the sequence of probabilities is a geometric sequence.


Example 4.8

Assume that the probability of a defective computer component is 0.02. Components are randomly selected. Find the probability that the first defect is caused by the seventh component tested. How many components do you expect to test until one is found to be defective?

Let X = the number of computer components tested until the first defect is found.

X takes on the values 1, 2, 3, ... where p = 0.02. X ~ G(0.02)

Find P(x = 7). Answer: P(x = 7) = (1 - 0.02)7-1 × 0.02 = 0.0177.

The probability that the seventh component is the first defect is 0.0177.

The graph of X ~ G(0.02) is:

Figure 4.2

Figure 4.2

The y-axis contains the probability of x, where X = the number of computer components tested. Notice that the probabilities decline by a common increment. This increment is the same ratio between each number and is called a geometric progression and thus the name for this probability density function.

The number of components that you would expect to test until you find the first defective component is the mean, μ = 50.

The formula for the mean for the random variable defined as number of failures until first success is μ =\dfrac{1}{p} = \dfrac{1}{0.02} = 50

See Example 4.9 for an example where the geometric random variable is defined as number of trials until first success. The expected value of this formula for the geometric will be different from this version of the distribution.

The formula for the variance is σ^2 = (\dfrac{1}{p})(\dfrac{1}{p}−1) = (\dfrac{1}{0.02})(\dfrac{1}{0.02}−1) = 2,450

The standard deviation is σ = \sqrt{(\dfrac{1}{p})(\dfrac{1}{p}−1)} =\sqrt{ (\dfrac{1}{0.02})(\dfrac{1}{0.02}−1)} = 49.5


Example 4.9

Problem
The lifetime risk of developing pancreatic cancer is about one in 78 (1.28%). Let X = the number of people you ask before one says he or she has pancreatic cancer. The random variable X in this case includes only the number of trials that were failures and does not count the trial that was a success in finding a person who had the disease. The appropriate formula for this random variable is the second one presented above. Then X is a discrete random variable with a geometric distribution: X ~ G(\dfrac{1}{78}) or X ~ G(0.0128).

  1. What is the probability of that you ask 9 people before one says he or she has pancreatic cancer? This is asking, what is the probability that you ask 9 people unsuccessfully and the tenth person is a success?
  2. What is the probability that you must ask 20 people?
  3. Find the (i) mean and (ii) standard deviation of X.

Solution 1

  1. P(x = 9) = (1 - 0.0128)9 · 0.0128 = 0.0114
  2. P(x = 20) = (1 - 0.0128)19 · 0.0128 =0.01

    1. Mean = μ = \dfrac{(1−p)}{p}=\dfrac{(1−0.0128)}{0.0128}=77.12
    2. Standard Deviation = σ =\sqrt{\dfrac{1−p}{p^2}} =\sqrt{ \dfrac{1−0.01280}{.0128^2}} ≈ 77.62


Try It 4.9

The literacy rate for a nation measures the proportion of people age 15 and over who can read and write. The literacy rate for women in The United Colonies of Independence is 12%. Let X = the number of women you ask until one says that she is literate.

  1. What is the probability distribution of X?
  2. What is the probability that you ask five women before one says she is literate?
  3. What is the probability that you must ask ten women?


Example 4.10

A baseball player has a batting average of 0.320. This is the general probability that he gets a hit each time he is at bat.

Problem
What is the probability that he gets his first hit in the third trip to bat?

Solution 1
P (x=3) = (1-0.32)3-1 × .32 = 0.1480
In this case the sequence is failure, failure success.

Problem
How many trips to bat do you expect the hitter to need before getting a hit?

Solution 2
μ=\dfrac{1}{p}=\dfrac{1}{0.320}=3.125≈3
This is simply the expected value of successes and therefore the mean of the distribution.


Example 4.11

Problem
There is an 80% chance that a Dalmatian dog has 13 black spots. You go to a dog show and count the spots on Dalmatians. What is the probability that you will review the spots on 3 dogs before you find one that has 13 black spots?

Solution 1
P(x=3) = (1 - 0.80)3 × 0.80 = 0.0064

Poisson Distribution

Another useful probability distribution is the Poisson distribution, or waiting time distribution. This distribution is used to determine how many checkout clerks are needed to keep the waiting time in line to specified levels, how may telephone lines are needed to keep the system from overloading, and many other practical applications. A modification of the Poisson, the Pascal, invented nearly four centuries ago, is used today by telecommunications companies worldwide for load factors, satellite hookup levels and Internet capacity problems. The distribution gets its name from Simeon Poisson who presented it in 1837 as an extension of the binomial distribution which we will see can be estimated with the Poisson.

There are two main characteristics of a Poisson experiment.

  1. The Poisson probability distribution gives the probability of a number of events occurring in a fixed interval of time or space if these events happen with a known average rate.
  2. The events are independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on the average, there are five words spelled incorrectly in 100 pages. The interval is the 100 pages and it is assumed that there is no relationship between when misspellings occur.
  3. The random variable X = the number of occurrences in the interval of interest.


Example 4.12

Problem
A bank expects to receive six bad checks per day, on average. What is the probability of the bank getting fewer than five bad checks on any given day? Of interest is the number of checks the bank receives in one day, so the time interval of interest is one day. Let X = the number of bad checks the bank receives in one day. If the bank expects to receive six bad checks per day then the average is six checks per day. Write a mathematical statement for the probability question.

Solution 1
P(x < 5)


Example 4.13

You notice that a news reporter says "uh," on average, two times per broadcast. What is the probability that the news reporter says "uh" more than two times per broadcast.

This is a Poisson problem because you are interested in knowing the number of times the news reporter says "uh" during a broadcast.

Problem
a. What is the interval of interest?

Solution 1
a. one broadcast measured in minutes

Problem
b. What is the average number of times the news reporter says "uh" during one broadcast?

Solution 2
b. 2

Problem
c. Let X = ____________. What values does X take on?

Solution 3
c. Let X = the number of times the news reporter says "uh" during one broadcast.
x = 0, 1, 2, 3, ...

Problem
d. The probability question is P(______).

Solution 4
d. P(x > 2)


Notation for the Poisson: P = Poisson Probability Distribution Function


X ~ P(μ)

Read this as "X is a random variable with a Poisson distribution". The parameter is μ (or λ); μ (or λ) = the mean for the interval of interest. The mean is the number of occurrences that occur on average during the interval period.

The formula for computing probabilities that are from a Poisson process is:

P(x)=\dfrac{μ^xe^{−μ}}{x!}

where P(X) is the probability of X successes, μ is the expected number of successes based upon historical data, e is the natural logarithm approximately equal to 2.718, and X is the number of successes per unit, usually per unit of time.

In order to use the Poisson distribution, certain assumptions must hold. These are: the probability of a success, μ, is unchanged within the interval, there cannot be simultaneous successes within the interval, and finally, that the probability of a success among intervals is independent, the same assumption of the binomial distribution.

In a way, the Poisson distribution can be thought of as a clever way to convert a continuous random variable, usually time, into a discrete random variable by breaking up time into discrete independent intervals. This way of thinking about the Poisson helps us understand why it can be used to estimate the probability for the discrete random variable from the binomial distribution. The Poisson is asking for the probability of a number of successes during a period of time while the binomial is asking for the probability of a certain number of successes for a given number of trials.


Example 4.14

Leah's answering machine receives about six telephone calls between 8 a.m. and 10 a.m. What is the probability that Leah receives more than one call in the next 15 minutes?

Let X = the number of calls Leah receives in 15 minutes. (The interval of interest is 15 minutes or ¼ hour).

x = 0, 1, 2, 3, ...

If Leah receives, on the average, six telephone calls in two hours, and there are eight 15 minute intervals in two hours, then Leah receives

(1/8)(6) = 0.75 calls in 15 minutes, on average. So, μ = 0.75 for this problem.

X ~ P(0.75)

Find P(x > 1). P(x > 1) = 0.1734

Probability that Leah receives more than one telephone call in the next 15 minutes is about 0.1734.

The graph of X ~ P(0.75) is:

Figure 4.3

Figure 4.3

The y-axis contains the probability of x where X = the number of calls in 15 minutes.


Example 4.15

According to a survey a university professor gets, on average, 7 emails per day. Let X = the number of emails a professor receives per day. The discrete random variable X takes on the values x = 0, 1, 2 …. The random variable X has a Poisson distribution: X ~ P(7). The mean is 7 emails.

Problem

  • What is the probability that an email user receives exactly 2 emails per day?
  • What is the probability that an email user receives at most 2 emails per day?
  • What is the standard deviation?

Solution 1

  • P(x=2)=\dfrac{μ^xe^{-μ}}{x!}=\dfrac{7^2e^{−7}}{2!}=0.022
  • P(x≤2)=\dfrac{7^0e^{−7}}{0!}+\dfrac{7^1e^{−7}}{1!}+\dfrac{7^2e^{−7}}{2!}=0.029
  • Standard Deviation = σ=\sqrt{μ} =\sqrt{7} ≈2.65


Example 4.16

Text message users receive or send an average of 41.5 text messages per day.
Problem

  1. How many text messages does a text message user receive or send per hour?
  2. What is the probability that a text message user receives or sends two messages per hour?
  3. What is the probability that a text message user receives or sends more than two messages per hour?

Solution 1

  1. Let X = the number of texts that a user sends or receives in one hour. The average number of texts received per hour is \dfrac{41.5}{24} ≈ 1.7292.
  2. P(x=2)=\dfrac{μ^xe^{-μ}}{x!}=\dfrac{1.729^2e^{−1.729}}{2!}=0.265
  3. P(x>2)=1−P(x≤2)=1−[\dfrac{7^0e^{−7}}{0!}+\dfrac{7^1e^{−7}}{1!}+\dfrac{7^2e^{−7}}{2!}]=0.250


Example 4.17

Problem
On May 13, 2013, starting at 4:30 PM, the probability of low seismic activity for the next 48 hours in Alaska was reported as about 1.02%. Use this information for the next 200 days to find the probability that there will be low seismic activity in ten of the next 200 days. Use both the binomial and Poisson distributions to calculate the probabilities. Are they close?

Solution 1
Let X = the number of days with low seismic activity.

Using the binomial distribution:

P(x=10)=\dfrac{200!}{10!(200−10)!} \times .0102^{10} \times .9898^{190}=0.000039

Using the Poisson distribution:

Calculate μ = np = 200(0.0102) ≈ 2.04

P(x=10)=\dfrac{μ^xe^{-μ}}{x!}=\dfrac{2.04^{10}e^{−2.04}}{10!}=0.000045

We expect the approximation to be good because n is large (greater than 20) and p is small (less than 0.05). The results are close - both probabilities reported are almost 0.


Estimating the Binomial Distribution with the Poisson Distribution

We found before that the binomial distribution provided an approximation for the hypergeometric distribution. Now we find that the Poisson distribution can provide an approximation for the binomial. We say that the binomial distribution approaches the Poisson. The binomial distribution approaches the Poisson distribution is as n gets larger and p is small such that np becomes a constant value. There are several rules of thumb for when one can say they will use a Poisson to estimate a binomial. One suggests that np, the mean of the binomial, should be less than 25. Another author suggests that it should be less than 7. And another, noting that the mean and variance of the Poisson are both the same, suggests that np and npq, the mean and variance of the binomial, should be greater than 5. There is no one broadly accepted rule of thumb for when one can use the Poisson to estimate the binomial.

As we move through these probability distributions we are getting to more sophisticated distributions that, in a sense, contain the less sophisticated distributions within them. This proposition has been proven by mathematicians. This gets us to the highest level of sophistication in the next probability distribution which can be used as an approximation to all of those that we have discussed so far. This is the normal distribution.


Example 4.18

A survey of 500 seniors in the Price Business School yields the following information. 75% go straight to work after graduation. 15% go on to work on their MBA. 9% stay to get a minor in another program. 1% go on to get a Master's in Finance.

Problem
What is the probability that more than 2 seniors go to graduate school for their Master's in finance?

Solution 1
This is clearly a binomial probability distribution problem. The choices are binary when we define the results as "Graduate School in Finance" versus "all other options". The random variable is discrete, and the events are, we could assume, independent. Solving as a binomial problem, we have:

Binomial Solution

n⋅p=500⋅0.01=5=µ

P(0)=\dfrac{500!}{0!(500−0)!}0.01^0(1−0.01)^{500−0}=0.00657

P(1)=\dfrac{500!}{1!(500−1)!}0.01^1(1−0.01)^{500−1}=0.03318

P(2)=\dfrac{500!}{2!(500−2)!}0.01^2(1−0.01)^{500−2}=0.08363

Adding all 3 together = 0.12339

1−0.12339=0.87661

Poisson approximation

n⋅p=500⋅0.01=5=μ

n⋅p⋅(1−p)=500⋅0.01⋅(0.99)≈5=σ^2=μ

P(X)=\dfrac{e^{−np}(np)^x}{x!}={P(0)=\dfrac{e^{−5}⋅5^0}{0!}}+{P(1)=\dfrac{e^{−5}⋅5^1}{1!}}+{P(2)=\dfrac{e^{−5}⋅5^2}{2}!}

0.0067+0.0337+0.0842=0.1247

1−0.1247=0.8753

An approximation that is off by 1 one thousandth is certainly an acceptable approximation.