Conditional Probability

Definition

Conditioning on an event

Kolmogorov definition

Given two events A and B from the sigma-field of a probability space, with the unconditional probability of B being greater than zero (i.e., P(B) > 0), the conditional probability of A given $P(A\mid B)$ ) is the probability of A occurring if B has or is assumed to have happened. A is assumed to be the set of all possible outcomes of an experiment or random trial that has a restricted or reduced sample space. The conditional probability can be found by the quotient of the probability of the joint intersection of events A and B, that is, $P(A\cap B)$ , the probability at which A and B occur together, and the probability of B:

$P(A\mid B)={\frac {P(A\cap B)}{P(B)}}$ .

Illustration of conditional probabilities with an Euler diagram

Illustration of conditional probabilities with an Euler diagram. The unconditional probability P(A) = 0.30 + 0.10 + 0.12 = 0.52. However, the conditional probability P(A|B₁) = 1, P(A|B₂) = 0.12 ÷ (0.12 + 0.04) = 0.75, and P(A|B₃) = 0.

On a tree diagram, branch probabilities are conditional on the event associated with the parent node.

On a tree diagram, branch probabilities are conditional on the event associated with the parent node. (Here, the overbars indicate that the event does not occur).

For a sample space consisting of equal likelihood outcomes, the probability of the event A is understood as the fraction of the number of outcomes in A to the number of all outcomes in the sample space. Then, this equation is understood as the fraction of the set $A\cap B$ to the set B. Note that the above equation is a definition, not just a theoretical result. We denote the quantity ${\frac {P(A\cap B)}{P(B)}}$ as $P(A\mid B)$ and call it the "conditional probability of A given B".

As an axiom of probability

Venn Pie Chart describing conditional probabilities

Some authors, such as de Finetti, prefer to introduce conditional probability as an axiom of probability:

$P(A\cap B)=P(A\mid B)P(B)$ .

This equation for a conditional probability, although mathematically equivalent, may be intuitively easier to understand. It can be interpreted as "the probability of B occurring multiplied by the probability of A occurring, provided that B has occurred, is equal to the probability of the A and B occurrences together, although not necessarily occurring at the same time". Additionally, this may be preferred philosophically; under major probability interpretations, such as the subjective theory, conditional probability is considered a primitive entity. Moreover, this "multiplication rule" can be practically useful in computing the probability of $A\cap B$ and introduces a symmetry with the summation axiom for Poincaré Formula:

$P(A\cup B)=P(A)+P(B)-P(A\cap B)$

Thus the equations can be combined to find a new representation of the :

$P(A\cap B)=P(A)+P(B)-P(A\cup B)=P(A\mid B)P(B)$

$P(A\cup B)={P(A)+P(B)-P(A\mid B){P(B)}}$

As the probability of a conditional event

Conditional probability can be defined as the probability of a conditional event $A_{B}$ . The Goodman–Nguyen–Van Fraassen conditional event can be defined as:

A B = ⋃ i ≥ 1 ( ⋂ j < i B ¯ j , A i B i ) , where $A_{i}$ and $B_{i}$ represent states or elements of A or B.

It can be shown that

$P(A_{B})={\frac {P(A\cap B)}{P(B)}}$

which meets the Kolmogorov definition of conditional probability.

Conditioning on an event of probability zero

If $P(B)=0$ , then according to the definition, $P(A\mid B)$ is undefined.

The case of greatest interest is that of a random variable Y, conditioned on a continuous random variable X resulting in a particular outcome x. The event $B=\{X=x\}$ has probability zero and, as such, cannot be conditioned on.

Instead of conditioning on X being exactly x, we could condition on it being closer than distance $\epsilon$ away from x. The event $B=\{x-\epsilon$ will generally have nonzero probability and hence, can be conditioned on. We can then take the limit

$\lim _{\epsilon \to 0}P(A\mid x-\epsilon$ .

For example, if two continuous random variables X and Y have a joint density $f_{X,Y}(x,y)$ , then by L'Hôpital's rule and Leibniz integral rule, upon differentiation with respect to $\epsilon$ :

lim ϵ → 0 P ( Y ∈ U ∣ x 0 − ϵ < X < x 0 + ϵ ) = lim ϵ → 0 ∫ x 0 − ϵ x 0 + ϵ ∫ U f X , Y

The resulting limit is the conditional probability distribution of Y given X and exists when the denominator, the probability density $f_{X}(x_{0})$ , is strictly positive.

It is tempting to define the undefined probability $P(A\mid X=x)$ using this limit, but this cannot be done in a consistent manner. In particular, it is possible to find random variables X and W and values x, w such that the events $\{X=x\}$ and $\{W=w\}$ are identical but the resulting limits are not:

$\lim _{\epsilon \to 0}P(A\mid x-\epsilon \leq X\leq x+\epsilon )\neq \lim _{\epsilon \to 0}P(A\mid w-\epsilon \leq W\leq w+\epsilon )$ .

The Borel–Kolmogorov paradox demonstrates this with a geometrical argument.

Conditioning on a discrete random variable

Let X be a discrete random variable and its possible outcomes denoted V. For example, if X represents the value of a rolled die then V is the set $\{1,2,3,4,5,6\}$ . Let us assume for the sake of presentation that X is a discrete random variable, so that each value in V has a nonzero probability.

For a value x in V and an event A, the conditional probability is given by $P(A\mid X=x)$ . Writing

$c(x,A)=P(A\mid X=x)$

for short, we see that it is a function of two variables, x and A.

For a fixed A, we can form the random variable $Y=c(X,A)$ . It represents an outcome of $P(A\mid X=x)$ whenever a value x of X is observed.

The conditional probability of A given X can thus be treated as a random variable Y with outcomes in the interval $[0,1]$ . From the law of total probability, its expected value is equal to the unconditional probability of A.

Partial conditional probability

The partial conditional probability $P(A\mid B_{1}\equiv b_{1},\ldots ,B_{m}\equiv b_{m})$ is about the probability of event $A$ given that each of the condition events $B_{i}$ has occurred to a degree $b_{i}$ (degree of belief, degree of experience) that might be different from 100%. Frequentistically, partial conditional probability makes sense, if the conditions are tested in experiment repetitions of appropriate length $n$ . Such $n$ -bounded partial conditional probability can be defined as the conditionally expected average occurrence of event $A$ in testbeds of length $n$ that adhere to all of the probability specifications $B_{i}\equiv b_{i}$ , i.e.:

$P^{n}(A\mid B_{1}\equiv b_{1},\ldots ,B_{m}\equiv b_{m})=\operatorname {E} ({\overline {A}}^{n}\mid {\overline {B}}_{1}^{n}=b_{1},\ldots ,{\overline {B}}_{m}^{n}=b_{m})$

Based on that, partial conditional probability can be defined as

$P(A\mid B_{1}\equiv b_{1},\ldots ,B_{m}\equiv b_{m})=\lim _{n\to \infty }P^{n}(A\mid B_{1}\equiv b_{1},\ldots ,B_{m}\equiv b_{m})$ ,

where $b_{i}n\in \mathbb {N}$

Jeffrey conditionalization is a special case of partial conditional probability, in which the condition events must form a partition:

$P(A\mid B_{1}\equiv b_{1},\ldots ,B_{m}\equiv b_{m})=\sum _{i=1}^{m}b_{i}P(A\mid B_{i})$