Probability Distributions and their Stories

Discrete distributions

Negative Binomial distribution

  • Story. We perform a series of Bernoulli trials. The number of failures, y, before we get α successes is Negative Binomially distributed. An equivalent story is that the sum of α independent and identically Gamma distributed variables is Negative Binomial distributed.
  • Example. Bursty gene expression can give mRNA count distributions that are Negative Binomially distributed. Here, "success" is that a burst in gene expression stops. So, the parameter β is the mean length of a burst in expression. The parameter α is related to the frequency of the bursts. If multiple bursts are possible within the lifetime of mRNA, then α>1. Then, the number of "failures" is the number of mRNA transcripts that are made in the characteristic lifetime of mRNA.
  • Parameters. There are two parameters: α, the desired number of successes, and β, which is the mean of the α identical Gamma distributions that give the Negative Binomial. The probability of success of each Bernoulli trial is given by β/(1+β).
  • Support. The Negative-Binomial distribution is supported on the set of nonnegative integers.
  • Probability mass function.
\begin{align}\\ \phantom{blah}f(y;\alpha,\beta) = \begin{pmatrix}y+\alpha-1 \\\alpha-1\end{pmatrix}\left(\frac{\beta}{1+\beta}\right)^\alpha \left(\frac{1}{1+\beta}\right)^y\\ \phantom{blah}\end{align}

Here, we use a combinatorial notation;

\begin{align}\\ \phantom{blah}\begin{pmatrix}y+\alpha-1 \\\alpha-1\end{pmatrix} = \frac{(y+\alpha-1)!}{(\alpha-1)!\,y!}.\\ \phantom{blah}\end{align}

Generally speaking, α need not be an integer, so we may write the PMF as

\begin{align}\\ \phantom{blah}f(y;\alpha,\beta) = \frac{\Gamma(y+\alpha)}{\Gamma(\alpha) \, y!}\,\left(\frac{\beta}{1+\beta}\right)^\alpha \left(\frac{1}{1+\beta}\right)^y.\end{align}

  • Usage
Package Syntax
NumPy np.random.negative_binomial(alpha, beta/(1+beta))
SciPy scipy.stats.nbinom(alpha, beta/(1+beta))
Stan neg_binomial(alpha, beta)
Stan with(μ,ϕ) parametrization neg_binomial_2(mu, phi)

  • Related distributions.
    • The Geometric distribution is a special case of the Negative Binomial distribution in which α=1 and θ=β/(1+β).
    • The continuous analog of the Negative Binomial distribution is the Gamma distribution.
    • In a certain limit, which is easier implemented using the (μ,ϕ) parametrization below, the Negative Binomial distribution becomes a Poisson distribution.
  • Notes.
    • The Negative Binomial distribution may be parametrized such that the probability mass function is

      \begin{align}\\ \phantom{blah} f(y;\mu,\phi) = \frac{\Gamma(y+\phi)}{\Gamma(\phi) \, y!}\,\left(\frac{\phi}{\mu+\phi}\right)^\phi\left(\frac{\mu}{\mu+\phi}\right)^y. \\ \phantom{blah}\end{align}

      These parameters are related to the parametrization above by ϕ=α and μ=α/β. In the limit of ϕ→∞, which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter μ. This also gives meaning to the parameters μ and ϕ. μ is the mean of the Negative Binomial, and ϕ controls extra width of the distribution beyond Poisson. The smaller ϕ is, the broader the distribution.
    • In Stan, the Negative Binomial distribution using the (μ,ϕ) parametrization is called neg_binomial_2.
    • SciPy and NumPy use yet another parametrization. The PMF for SciPy is

      \begin{align}\\ \phantom{blah} f(y;n, p) = \frac{\Gamma(y+n)}{\Gamma(n) \, y!}\,p^n \left(1-p\right)^y. \\ \phantom{blah}\end{align}

      The parameter p is the probability of success of a Bernoulli trial. The parameters are related to the others we have defined by n=α=ϕ and p=β/(1+β)=ϕ/(μ+ϕ).
params = [dict(name='α', start=1, end=20, value=5, step=1),
          dict(name='β', start=0, end=5, value=1, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=50,
                            scipy_dist=st.nbinom,
                            params=params,
                            transform=lambda alpha, beta: (alpha, beta/(1+beta)),
                            x_axis_label='y',
                            title='Negative Binomial')
bokeh.io.show(app, notebook_url=notebook_url)