Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Discrete distributions

Negative Binomial distribution

  • Story. We perform a series of Bernoulli trials. The number of failures, y, before we get α successes is Negative Binomially distributed. An equivalent story is that the sum of α independent and identically Gamma distributed variables is Negative Binomial distributed.
  • Example. Bursty gene expression can give mRNA count distributions that are Negative Binomially distributed. Here, "success" is that a burst in gene expression stops. So, the parameter β is the mean length of a burst in expression. The parameter α is related to the frequency of the bursts. If multiple bursts are possible within the lifetime of mRNA, then α>1. Then, the number of "failures" is the number of mRNA transcripts that are made in the characteristic lifetime of mRNA.
  • Parameters. There are two parameters: α, the desired number of successes, and β, which is the mean of the α identical Gamma distributions that give the Negative Binomial. The probability of success of each Bernoulli trial is given by β/(1+β).
  • Support. The Negative-Binomial distribution is supported on the set of nonnegative integers.
  • Probability mass function.
\begin{align}\\ \phantom{blah}f(y;\alpha,\beta) = \begin{pmatrix}y+\alpha-1 \\\alpha-1\end{pmatrix}\left(\frac{\beta}{1+\beta}\right)^\alpha \left(\frac{1}{1+\beta}\right)^y\\ \phantom{blah}\end{align}

Here, we use a combinatorial notation;

\begin{align}\\ \phantom{blah}\begin{pmatrix}y+\alpha-1 \\\alpha-1\end{pmatrix} = \frac{(y+\alpha-1)!}{(\alpha-1)!\,y!}.\\ \phantom{blah}\end{align}

Generally speaking, α need not be an integer, so we may write the PMF as

\begin{align}\\ \phantom{blah}f(y;\alpha,\beta) = \frac{\Gamma(y+\alpha)}{\Gamma(\alpha) \, y!}\,\left(\frac{\beta}{1+\beta}\right)^\alpha \left(\frac{1}{1+\beta}\right)^y.\end{align}

  • Usage
Package Syntax
NumPy np.random.negative_binomial(alpha, beta/(1+beta))
SciPy scipy.stats.nbinom(alpha, beta/(1+beta))
Stan neg_binomial(alpha, beta)
Stan with(μ,ϕ) parametrization neg_binomial_2(mu, phi)

  • Related distributions.
    • The Geometric distribution is a special case of the Negative Binomial distribution in which α=1 and θ=β/(1+β).
    • The continuous analog of the Negative Binomial distribution is the Gamma distribution.
    • In a certain limit, which is easier implemented using the (μ,ϕ) parametrization below, the Negative Binomial distribution becomes a Poisson distribution.
  • Notes.
    • The Negative Binomial distribution may be parametrized such that the probability mass function is

      \begin{align}\\ \phantom{blah} f(y;\mu,\phi) = \frac{\Gamma(y+\phi)}{\Gamma(\phi) \, y!}\,\left(\frac{\phi}{\mu+\phi}\right)^\phi\left(\frac{\mu}{\mu+\phi}\right)^y. \\ \phantom{blah}\end{align}

      These parameters are related to the parametrization above by ϕ=α and μ=α/β. In the limit of ϕ→∞, which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter μ. This also gives meaning to the parameters μ and ϕ. μ is the mean of the Negative Binomial, and ϕ controls extra width of the distribution beyond Poisson. The smaller ϕ is, the broader the distribution.
    • In Stan, the Negative Binomial distribution using the (μ,ϕ) parametrization is called neg_binomial_2.
    • SciPy and NumPy use yet another parametrization. The PMF for SciPy is

      \begin{align}\\ \phantom{blah} f(y;n, p) = \frac{\Gamma(y+n)}{\Gamma(n) \, y!}\,p^n \left(1-p\right)^y. \\ \phantom{blah}\end{align}

      The parameter p is the probability of success of a Bernoulli trial. The parameters are related to the others we have defined by n=α=ϕ and p=β/(1+β)=ϕ/(μ+ϕ).
params = [dict(name='α', start=1, end=20, value=5, step=1),
          dict(name='β', start=0, end=5, value=1, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=50,
                            scipy_dist=st.nbinom,
                            params=params,
                            transform=lambda alpha, beta: (alpha, beta/(1+beta)),
                            x_axis_label='y',
                            title='Negative Binomial')
bokeh.io.show(app, notebook_url=notebook_url)