Applying Bayes' Theorem in Deduction

Forms

Events

Simple form

For events A and B, provided that P(B) ≠ 0,

P(A|B)={\frac {P(B|A)P(A)}{P(B)}}.

In many applications, for instance in Bayesian inference, the event B is fixed in the discussion, and we wish to consider the impact of its having been observed on our belief in various possible events A. In such a situation the denominator of the last expression, the probability of the given evidence B, is fixed; what we want to vary is A. Bayes' theorem then shows that the posterior probabilities are proportional to the numerator, so the last equation becomes:

P(A|B)\propto P(A)\cdot P(B|A).

In words, the posterior is proportional to the prior times the likelihood.

If events A1, A2, ..., are mutually exclusive and exhaustive, i.e., one of them is certain to occur but no two can occur together, we can determine the proportionality constant by using the fact that their probabilities must add up to one. For instance, for a given event A, the event A itself and its complement ¬A are exclusive and exhaustive. Denoting the constant of proportionality by c we have

P(A|B)=c\cdot P(A)\cdot P(B|A){\text{ and }}P(\neg A|B)=c\cdot P(\neg A)\cdot P(B|\neg A).

Adding these two formulas we deduce that

1=c\cdot (P(B|A)\cdot P(A)+P(B|\neg A)\cdot P(\neg A)),

or

c={\frac {1}{P(B|A)\cdot P(A)+P(B|\neg A)\cdot P(\neg A)}}={\frac {1}{P(B)}}.


Alternative form

Contingency table

  Background

Proposition
B ¬B
(not B)
Total
A P(B|A)·P(A)
= P(A|B)·P(B)
P(¬B|A)·P(A)
= P(A|¬B)·P(¬B)
P(A)
¬A
(not A)
P(B|¬A)·P(¬A)
= P(¬A|B)·P(B)
P(¬B|¬A)·P(¬A)
= P(¬A|¬B)·P(¬B)
P(¬A) =
1−P(A)
Total    P(B)    P(¬B) = 1−P(B) 1

Another form of Bayes' theorem for two competing statements or hypotheses is:

P(A|B)={\frac {P(B|A)P(A)}{P(B|A)P(A)+P(B|\neg A)P(\neg A)}}.

For an epistemological interpretation:

For proposition A and evidence or background B,

  • (A) is the prior probability, the initial degree of belief in A.
  • P(\neg A) is the corresponding initial degree of belief in not-A, that A is false, where P(\neg A)=1-P(A)
  • P(B|A) is the conditional probability or likelihood, the degree of belief in B given that proposition A is true.
  • P(B|\neg A) is the conditional probability or likelihood, the degree of belief in B given that proposition A is false.
  • P(A|B) is the posterior probability, the probability of A after taking into account B.


Extended form

Often, for some partition {Aj} of the sample space, the event space is given in terms of P(Aj) and P(B | Aj). It is then useful to compute P(B) using the law of total probability:

P(B)=\sum _{j}P(B\cap A_{j}),

Or (using the multiplication rule for conditional probability),

P(B)={\sum _{j}P(B|A_{j})P(A_{j})},

\Rightarrow P(A_{i}|B)={\frac {P(B|A_{i})P(A_{i})}{\sum \limits _{j}P(B|A_{j})P(A_{j})}}\cdot

In the special case where A is a binary variable:

P(A|B)={\frac {P(B|A)P(A)}{P(B|A)P(A)+P(B|\neg A)P(\neg A)}}\cdot


Random variables

Figure 5: Bayes' theorem applied to an event space generated by continuous random variables X and Y with known probability di

Figure 5: Bayes' theorem applied to an event space generated by continuous random variables X and Y with known probability distributions. There exists an instance of Bayes' theorem for each point in the domain. In practice, these instances might be parametrized by writing the specified probability densities as a function of x and y.

Consider a sample space Ω generated by two random variables X and Y with known probability distributions. In principle, Bayes' theorem applies to the events A = {X = x} and B = {Y = y}.

P(X{=}x|Y{=}y)={\frac {P(Y{=}y|X{=}x)P(X{=}x)}{P(Y{=}y)}}

However, terms become 0 at points where either variable has finite probability density. To remain useful, Bayes' theorem can be formulated in terms of the relevant densities (see Derivation).


Simple form

If X is continuous and Y is discrete,

f_{X|Y{=}y}(x)={\frac {P(Y{=}y|X{=}x)f_{X}(x)}{P(Y{=}y)}}

where each f is a density function.

If X is discrete and Y is continuous,

P(X{=}x|Y{=}y)={\frac {f_{Y|X{=}x}(y)P(X{=}x)}{f_{Y}(y)}}.

If both X and Y are continuous,

f_{X|Y{=}y}(x)={\frac {f_{Y|X{=}x}(y)f_{X}(x)}{f_{Y}(y)}}.


Extended form

Figure 6: A way to conceptualize event spaces generated by continuous random variables X and Y

Figure 6: A way to conceptualize event spaces generated by continuous random variables X and Y

A continuous event space is often conceptualized in terms of the numerator terms. It is then useful to eliminate the denominator using the law of total probability. For fY(y), this becomes an integral:

f_{Y}(y)=\int _{-\infty }^{\infty }f_{Y|X=\xi }(y)f_{X}(\xi )\,d\xi .


Bayes' rule in odds form

Bayes' theorem in odds form is:

O(A_{1}:A_{2}\vert B)=O(A_{1}:A_{2})\cdot \Lambda (A_{1}:A_{2}\vert B)

where

\Lambda (A_{1}:A_{2}\vert B)={\frac {P(B\vert A_{1})}{P(B\vert A_{2})}}

is called the Bayes factor or likelihood ratio. The odds between two events is simply the ratio of the probabilities of the two events. Thus

O(A_{1}:A_{2})={\frac {P(A_{1})}{P(A_{2})}},

O(A_{1}:A_{2}\vert B)={\frac {P(A_{1}\vert B)}{P(A_{2}\vert B)}},

Thus, the rule says that the posterior odds are the prior odds times the Bayes factor, or in other words, the posterior is proportional to the prior times the likelihood.

In the special case that A_{1}=A and A_{2}=\neg A, one writes O(A)=O(A:\neg A)=P(A)/(1-P(A)), and uses a similar abbreviation for the Bayes factor and for the conditional odds. The odds on A is by definition the odds for and against A. Bayes' rule can then be written in the abbreviated form

O(A\vert B)=O(A)\cdot \Lambda (A\vert B),

or, in words, the posterior odds on A equals the prior odds on A times the likelihood ratio for A given information B. In short, posterior odds equals prior odds times likelihood ratio.

For example, if a medical test has a sensitivity of 90% and a specificity of 91%, then the positive Bayes factor is \Lambda _{+}=P({\text{True Positive}})/P({\text{False Positive}})=90\%/(100\%-91\%)=10. Now, if the prevalence of this disease is 9.09%, and if we take that as the prior probability, then the prior odds is about 1:10. So after receiving a positive test result, the posterior odds of actually having the disease becomes 1:1, which means that the posterior probability of having the disease is 50%. If a second test is performed in serial testing, and that also turns out to be positive, then the posterior odds of actually having the disease becomes 10:1, which means a posterior probability of about 90.91%. The negative Bayes factor can be calculated to be 91%/(100%-90%)=9.1, so if the second test turns out to be negative, then the posterior odds of actually having the disease is 1:9.1, which means a posterior probability of about 9.9%.

The example above can also be understood with more solid numbers: Assume the patient taking the test is from a group of 1000 people, where 91 of them actually have the disease (prevalence of 9.1%). If all these 1000 people take the medical test, 82 of those with the disease will get a true positive result (sensitivity of 90.1%), 9 of those with the disease will get a false negative result (false negative rate of 9.9%), 827 of those without the disease will get a true negative result (specificity of 91.0%), and 82 of those without the disease will get a false positive result (false positive rate of 9.0%). Before taking any test, the patient's odds for having the disease is 91:909. After receiving a positive result, the patient's odds for having the disease is

{\frac {91}{909}}\times {\frac {90.1\%}{9.0\%}}={\frac {91\times 90.1\%}{909\times 9.0\%}}=1:1

which is consistent with the fact that there are 82 true positives and 82 false positives in the group of 1000 people.