Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Continuous Multivariate distributions

Dirichlet distribution

  • Story. The Dirichlet distribution is a generalization of the Beta distribution. It is a probability distribution describing probabilities of outcomes. Instead of describing probability of one of two outcomes of a Bernoulli trial, like the Beta distribution does, it describes probability of K−1 of K outcomes. The Beta distribution is the special case of K=2.
  • Parameters. The parameters are α_1,α_2,…α_K, all strictly positive, defined analogously to α and β of the Beta distribution.
  • Support. The Dirichlet distribution has support on the interval [0, 1] such that \sum_{i=1}^K y_i = 1.
  • Probability density function.

    \begin{align}
f(\boldsymbol{\theta};\boldsymbol{\alpha}) = \frac{1}{B(\boldsymbol{\alpha})}\,\prod_{i=1}^K y_i^{\alpha_i-1}
\end{align}

    where

    \begin{align}B(\boldsymbol{\alpha}) = \frac{\prod_{i=1}^K\Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^K \alpha_i\right)}\end{align}

    is the multivariate Beta function.
  • Related distributions.
    • The special case where K=2 is a Beta distribution with parameters α=α_1 and β=α_2.