Bayesian Networks

View

Statistical introduction

Given data x\,\! and parameter \theta, a simple Bayesian analysis starts with a prior probability (prior) p(\theta ) and likelihood p(x\mid \theta ) to compute a posterior probability p(\theta \mid x)\propto p(x\mid \theta )p(\theta ).

Often the prior on \theta depends in turn on other parameters \varphi that are not mentioned in the likelihood. So, the prior p(\theta ) must be replaced by a likelihood p(\theta \mid \varphi ), and a prior p(\varphi ) on the newly introduced parameters \varphi is required, resulting in a posterior probability

p(\theta ,\varphi \mid x)\propto p(x\mid \theta )p(\theta \mid \varphi )p(\varphi ).

This is the simplest example of a hierarchical Bayes model.

The process may be repeated; for example, the parameters varphi may depend in turn on additional parameters \psi \,\!, which require their own prior. Eventually the process must terminate, with priors that do not depend on unmentioned parameters.


Introductory examples

Given the measured quantities x_{1},\dots ,x_{n}\,\! each with normally distributed errors of known standard deviation \sigma \,\!,

x_{i}\sim N(\theta _{i},\sigma ^{2})

Suppose we are interested in estimating the \theta _{i}. An approach would be to estimate the \theta _{i} using a maximum likelihood approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply

\theta _{i}=x_{i}.

However, if the quantities are related, so that for example the individual \theta _{i} have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,

x_{i}\sim N(\theta _{i},\sigma ^{2}),

\theta _{i}\sim N(\varphi ,\tau ^{2}),

with improper priors \varphi \sim {\text{flat}}, \tau \sim {\text{flat}}\in (0,\infty ). When n\geq 3, this is an identified model (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual \theta _{i} will tend to move, or shrink away from the maximum likelihood estimates towards their common mean. This shrinkage is a typical behavior in hierarchical Bayes models.


Restrictions on priors

Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable \tau \,\! in the example. The usual priors such as the Jeffreys prior often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the expected loss will be inadmissible.