Completion requirements
Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.
Discrete distributions
Negative Binomial distribution
- Story. We perform a series of Bernoulli trials. The number of failures,
, before we get
successes is Negative Binomially distributed. An equivalent story is that the sum of
independent and identically Gamma distributed variables is Negative Binomial distributed.
- Example.
Bursty gene expression can give mRNA count distributions that are
Negative Binomially distributed. Here, "success" is that a burst in gene
expression stops. So, the parameter
is the mean length of a burst in expression. The parameter α is related to the frequency of the bursts. If multiple bursts are possible within the lifetime of mRNA, then
. Then, the number of "failures" is the number of mRNA transcripts that are made in the characteristic lifetime of mRNA.
- Parameters. There are two parameters:
, the desired number of successes, and
, which is the mean of the α identical Gamma distributions that give the Negative Binomial. The probability of success of each Bernoulli trial is given by
.
- Support. The Negative-Binomial distribution is supported on the set of nonnegative integers.
- Probability mass function.
Here, we use a combinatorial notation;
Generally speaking, α need not be an integer, so we may write the PMF as
- Usage
Package | Syntax |
---|---|
NumPy | np.random.negative_binomial(alpha, beta/(1+beta)) |
SciPy | scipy.stats.nbinom(alpha, beta/(1+beta)) |
Stan | neg_binomial(alpha, beta) |
Stan with(μ,ϕ) parametrization | neg_binomial_2(mu, phi) |
- Related distributions.
- The Geometric distribution is a special case of the Negative Binomial distribution in which
and
.
- The continuous analog of the Negative Binomial distribution is the Gamma distribution.
- In a certain limit, which is easier implemented using the
parametrization below, the Negative Binomial distribution becomes a Poisson distribution.
- The Geometric distribution is a special case of the Negative Binomial distribution in which
- Notes.
- The Negative Binomial distribution may be parametrized such that the probability mass function is
These parameters are related to the parametrization above byand
. In the limit of
, which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter
. This also gives meaning to the parameters
and
.
is the mean of the Negative Binomial, and
controls extra width of the distribution beyond Poisson. The smaller
is, the broader the distribution.
- In Stan, the Negative Binomial distribution using the
parametrization is called neg_binomial_2.
- SciPy and NumPy use yet another parametrization. The PMF for SciPy is
The parameteris the probability of success of a Bernoulli trial. The parameters are related to the others we have defined by
and
.
- The Negative Binomial distribution may be parametrized such that the probability mass function is
params = [dict(name='α', start=1, end=20, value=5, step=1),
dict(name='β', start=0, end=5, value=1, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=50,
scipy_dist=st.nbinom,
params=params,
transform=lambda alpha, beta: (alpha, beta/(1+beta)),
x_axis_label='y',
title='Negative Binomial')
bokeh.io.show(app, notebook_url=notebook_url)