Bayesian Networks

Definitions and concepts

Several equivalent definitions of a Bayesian network have been offered. For the following, let G = (V,E) be a directed acyclic graph (DAG) and let X = (Xv), v ∈ V be a set of random variables indexed by V.


Factorization definition

X is a Bayesian network with respect to G if its joint probability density function (with respect to a product measure) can be written as a product of the individual density functions, conditional on their parent variables:

p(x)=\prod _{v\in V}p\left(x_{v}\,{\big |}\,x_{\operatorname {pa} (v)}\right)

where pa(v) is the set of parents of v (i.e. those vertices pointing directly to v via a single edge).

For any set of random variables, the probability of any member of a joint distribution can be calculated from conditional probabilities using the chain rule (given a topological ordering of X) as follows:

\operatorname {P} (X_{1}=x_{1},\ldots ,X_{n}=x_{n})=\prod _{v=1}^{n}\operatorname {P} \left(X_{v}=x_{v}\mid X_{v+1}=x_{v+1},\ldots ,X_{n}=x_{n}\right)

Using the definition above, this can be written as:

\operatorname {P} (X_{1}=x_{1},\ldots ,X_{n}=x_{n})=\prod _{v=1}^{n}\operatorname {P} (X_{v}=x_{v}\mid X_{j}=x_{j}{\text{ for each }}X_{j}\,{\text{ that is a parent of }}X_{v}\,)

The difference between the two expressions is the conditional independence of the variables from any of their non-descendants, given the values of their parent variables.


Local Markov property

X is a Bayesian network with respect to G if it satisfies the local Markov property: each variable is conditionally independent of its non-descendants given its parent variables:

X_{v}\perp \!\!\!\perp X_{V\,\smallsetminus \,\operatorname {de} (v)}\mid X_{\operatorname {pa} (v)}\quad {\text{for all }}v\in V

where de(v) is the set of descendants and V \ de(v) is the set of non-descendants of v.

This can be expressed in terms similar to the first definition, as

{\begin{aligned}&\operatorname {P} (X_{v}=x_{v}\mid X_{i}=x_{i}{\text{ for each }}X_{i}{\text{ that is not a descendant of }}X_{v}\,)\\[6pt]={}&P(X_{v}=x_{v}\mid X_{j}=x_{j}{\text{ for each }}X_{j}{\text{ that is a parent of }}X_{v}\,)\end{aligned}}

The set of parents is a subset of the set of non-descendants because the graph is acyclic.


Developing Bayesian networks

Developing a Bayesian network often begins with creating a DAG G such that X satisfies the local Markov property with respect to G. Sometimes this is a causal DAG. The conditional probability distributions of each variable given its parents in G are assessed. In many cases, in particular in the case where the variables are discrete, if the joint distribution of X is the product of these conditional distributions, then X is a Bayesian network with respect to G.


Markov blanket

The Markov blanket of a node is the set of nodes consisting of its parents, its children, and any other parents of its children. The Markov blanket renders the node independent of the rest of the network; the joint distribution of the variables in the Markov blanket of a node is sufficient knowledge for calculating the distribution of the node. X is a Bayesian network with respect to G if every node is conditionally independent of all other nodes in the network, given its Markov blanket.
d-separation

This definition can be made more general by defining the "d"-separation of two nodes, where d stands for directional. We first define the "d"-separation of a trail and then we will define the "d"-separation of two nodes in terms of that.

Let P be a trail from node u to v. A trail is a loop-free, undirected (i.e. all edge directions are ignored) path between two nodes. Then P is said to be d-separated by a set of nodes Z if any of the following conditions holds:

  • P contains (but does not need to be entirely) a directed chain, u\cdots \leftarrow m\leftarrow \cdots v or u\cdots \rightarrow m\rightarrow \cdots v, such that the middle node m is in Z,
  • P contains a fork, u\cdots \leftarrow m\rightarrow \cdots v, such that the middle node m is in Z, or
  • P contains an inverted fork (or collider), u\cdots \rightarrow m\leftarrow \cdots v, such that the middle node m is not in Z and no descendant of m is in Z.

The nodes u and v are d-separated by Z if all trails between them are d-separated. If u and v are not d-separated, they are d-connected.

X is a Bayesian network with respect to G if, for any two nodes u, v:

X_{u}\perp \!\!\!\perp X_{v}\mid X_{Z}

where Z is a set which d-separates u and v. (The Markov blanket is the minimal set of nodes which d-separates node v from all other nodes)


Causal networks

Although Bayesian networks are often used to represent causal relationships, this need not be the case: a directed edge from u to v does not require that Xv be causally dependent on Xu. This is demonstrated by the fact that Bayesian networks on the graphs:

a\rightarrow b\rightarrow c\qquad {\text{and}}\qquad a\leftarrow b\leftarrow c

are equivalent: that is they impose exactly the same conditional independence requirements.

A causal network is a Bayesian network with the requirement that the relationships be causal. The additional semantics of causal networks specify that if a node X is actively caused to be in a given state x (an action written as do(X = x)), then the probability density function changes to that of the network obtained by cutting the links from the parents of X to X, and setting X to the caused value x. Using these semantics, the impact of external interventions from data obtained prior to intervention can be predicted.