Probability Distributions and their Stories
Site: | Saylor Academy |
Course: | CS250: Python for Data Science |
Book: | Probability Distributions and their Stories |
Printed by: | Guest user |
Date: | Friday, 4 April 2025, 6:55 AM |
Description
Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.
Discrete distributions
Here, we will explain a series of discrete distributions.
Source: Justin Bois, http://bois.caltech.edu/dist_stories/t3b_probability_stories.html
This work is licensed under a Creative Commons Attribution 4.0 License.
Bernoulli distribution
- Story. A Bernoulli trial is an experiment that has two outcomes that can be encoded as success (
) or failure (
). The result
of a Bernoulli trial is Bernoulli distributed.
- Example. Check to see if a given bacterium is competent, given that it has probability
of being competent.
- Parameter. The Bernoulli distribution is parametrized by a single value,
, the probability that the trial is successful.
- Support. The Bernoulli distribution may be nonzero only for zero and one.
- Probability mass function.
Package | Syntax |
---|---|
NumPy | np.random.choice([0, 1], p=[1-theta, theta]) |
SciPy | scipy.stats.bernoulli(theta) |
Stan | bernoulli(theta) |
params = [dict(name='θ', start=0, end=1, value=0.5, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=1,
scipy_dist=st.bernoulli,
params=params,
x_axis_label='y',
title='Bernoulli')
bokeh.io.show(app, notebook_url=notebook_url)
Geometric distribution
- Story. We perform a series of Bernoulli trials with probability of success
until we get a success. The number of failures
before the success is Geometrically distributed.
- Example.
Consider actin polymerization. At each time step, an actin monomer may
add to the filament ("failure"), or an actin monomer may fall off
("success") with (usually very low) probability
. The length of actin filaments are Geometrically distributed.
- Parameter. The Geometric distribution is parametrized by a single value,
, the probability that the Bernoulli trial is successful.
- Support. The Geometric distribution, as defined here, is has support on the nonnegative integers.
- Probability mass function.
- Usage
Package | Syntax |
---|---|
NumPy | np.random.geometric(theta) |
SciPy | scipy.stats.geom(theta, loc=-1) |
Stan | neg_binomial(1, theta/(1-theta)) |
- Related distributions.
- Notes.
- The
Geometric distribution is not implemented in Stan. You can instead use a
Negative Binomial distribution fixing the parameter
to be unity and relating the parameter
of the Negative Binomial distribution to
as
.
- The Geometric distribution is defined differently in SciPy, instead replacing
with
. In SciPy's parametrization the Geometric distribution describes the number of successive Bernoulli trials (not just the failures; the success is included) necessary to get a success. To adjust for this, we use the loc=-1 kwarg.
params = [dict(name='θ', start=0, end=1, value=0.5, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=20,
scipy_dist=st.geom,
params=params,
transform=lambda theta: (theta, -1),
x_axis_label='y',
title='Geometric')
bokeh.io.show(app, notebook_url=notebook_url)
Negative Binomial distribution
- Story. We perform a series of Bernoulli trials. The number of failures,
, before we get
successes is Negative Binomially distributed. An equivalent story is that the sum of
independent and identically Gamma distributed variables is Negative Binomial distributed.
- Example.
Bursty gene expression can give mRNA count distributions that are
Negative Binomially distributed. Here, "success" is that a burst in gene
expression stops. So, the parameter
is the mean length of a burst in expression. The parameter α is related to the frequency of the bursts. If multiple bursts are possible within the lifetime of mRNA, then
. Then, the number of "failures" is the number of mRNA transcripts that are made in the characteristic lifetime of mRNA.
- Parameters. There are two parameters:
, the desired number of successes, and
, which is the mean of the α identical Gamma distributions that give the Negative Binomial. The probability of success of each Bernoulli trial is given by
.
- Support. The Negative-Binomial distribution is supported on the set of nonnegative integers.
- Probability mass function.
- Usage
Package | Syntax |
---|---|
NumPy | np.random.negative_binomial(alpha, beta/(1+beta)) |
SciPy | scipy.stats.nbinom(alpha, beta/(1+beta)) |
Stan | neg_binomial(alpha, beta) |
Stan with(μ,ϕ) parametrization | neg_binomial_2(mu, phi) |
- Related distributions.
- The Geometric distribution is a special case of the Negative Binomial distribution in which
and
.
- The continuous analog of the Negative Binomial distribution is the Gamma distribution.
- In a certain limit, which is easier implemented using the
parametrization below, the Negative Binomial distribution becomes a Poisson distribution.
- The Geometric distribution is a special case of the Negative Binomial distribution in which
- Notes.
- The Negative Binomial distribution may be parametrized such that the probability mass function is
These parameters are related to the parametrization above byand
. In the limit of
, which can be taken for the PMF, the Negative Binomial distribution becomes Poisson with parameter
. This also gives meaning to the parameters
and
.
is the mean of the Negative Binomial, and
controls extra width of the distribution beyond Poisson. The smaller
is, the broader the distribution.
- In Stan, the Negative Binomial distribution using the
parametrization is called neg_binomial_2.
- SciPy and NumPy use yet another parametrization. The PMF for SciPy is
The parameteris the probability of success of a Bernoulli trial. The parameters are related to the others we have defined by
and
.
- The Negative Binomial distribution may be parametrized such that the probability mass function is
params = [dict(name='α', start=1, end=20, value=5, step=1),
dict(name='β', start=0, end=5, value=1, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=50,
scipy_dist=st.nbinom,
params=params,
transform=lambda alpha, beta: (alpha, beta/(1+beta)),
x_axis_label='y',
title='Negative Binomial')
bokeh.io.show(app, notebook_url=notebook_url)
Binomial distribution
- Story. We perform
Bernoulli trials, each with probability
of success. The number of successes,
, is Binomially distributed.
- Example. Distribution of plasmids between daughter cells in cell division. Each of the
plasmids as a chance
of being in daughter cell 1 ("success"). The number of plasmids,
, in daughter cell 1 is binomially distributed.
- Parameters. There are two parameters: the probability
of success for each Bernoulli trial, and the number of trials,
.
- Support. The Binomial distribution is supported on the set of nonnegative integers.
- Probability mass function.
- Usage
Package Syntax NumPy np.random.binomial(N, theta)
SciPy scipy.stats.binom(N, theta)
Stan binomial(N, theta)
- Related distributions.
- In the limit of
and
such that the quantity
is fixed, the Binomial distribution becomes a Poisson distribution with parameter
.
- The
Binomial distribution is a limit of the Hypergeometric distribution.
Considering the Hypergeometric distribution and taking the limit of
such that
is fixed, we get a Binomial distribution with parameters
and
.
params = [dict(name='N', start=1, end=20, value=5, step=1),
dict(name='θ', start=0, end=1, value=0.5, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=20,
scipy_dist=st.binom,
params=params,
x_axis_label='n',
title='Binomial')
bokeh.io.show(app, notebook_url=notebook_url)
Poisson distribution
- Story. Rare events occur with a rate
per unit time. There is no "memory" of previous events; i.e., that rate is independent of time. A process that generates such events is called a Poisson process. The occurrence of a rare event in this context is referred to as an arrival. The number
of arrivals in unit time is Poisson distributed.
- Example. The number of mutations in a strand of DNA per unit length (since mutations are rare) are Poisson distributed.
- Parameter. The single parameter is the rate
of the rare events occurring.
- Support. The Poisson distribution is supported on the set of nonnegative integers.
- Probability mass function.
- Usage
- Related distributions.
- In the limit of
and
such that the quantity
is fixed, the Binomial distribution becomes a Poisson distribution with parameter
. Thus, for large
and small
,
. Considering the biological example of mutations, this is Binomially distributed: There are
bases, each with a probability
of mutation, so the number of mutations, n is binomially distributed. Since
is small and
is large, it is approximately Poisson distributed.
- In the limit of
Package | Syntax |
---|---|
NumPy | np.random.poisson(lam) |
SciPy | scipy.stats.poisson(lam) |
Stan | poisson(lam) |
params = [dict(name='λ', start=1, end=20, value=5, step=0.1)]
app = distribution_plot_app(x_min=0,
x_max=40,
scipy_dist=st.poisson,
params=params,
x_axis_label='n',
title='Poisson')
bokeh.io.show(app, notebook_url=notebook_url)
Hypergeometric distribution
Story. Consider an urn with- Example. There are
finches on an island, and
of them are tagged (and therefore
of them are untagged). You capture
finches. The number of tagged finches
is Hypergeometrically distributed,
, as defined below.
- Parameters. There are three parameters: the number of draws
, the number of white balls
, and the number of black balls
.
- Support. The Hypergeometric distribution is supported on the set of integers between
and
, inclusive.
- Probability mass function.
- Usage
Package Syntax NumPy np.random.hypergeometric(a, b, N)
SciPy scipy.stats.hypergeom(a+b, a, N)
Stan hypergeometric(N, a, b)
- Related distributions.
- Notes.
- This distribution is analogous to the Binomial distribution, except that the Binomial distribution describes draws from an urn with replacement. In the analogy,
.
- SciPy uses a different parametrization. Let
be the total number of balls in the urn. Then, noting the order of the parameters, since this is what scipy.stats.hypergeom expects,
- The random number generator in
numpy.random
has a different parametrization than in thescipy.stats
module. Thenumpy.random.hypergeom()
function uses the same parametrization as Stan, except the parameters are given in the order a, b, N, not N, a, b, as in Stan. - When using the sliders below, you will only get a plot if
because you cannot draw more balls out of the urn than are actually in there.
params = [dict(name='N', start=1, end=20, value=10, step=1),
dict(name='a', start=1, end=20, value=10, step=1),
dict(name='b', start=1, end=20, value=10, step=1)]
app = distribution_plot_app(x_min=0,
x_max=40,
scipy_dist=st.hypergeom,
params=params,
transform=lambda N, a, b: (a+b, a, N),
x_axis_label='n',
title='Hypergeometric')
bokeh.io.show(app, notebook_url=notebook_url)
Categorical distribution
- Story. A probability is assigned to each of a set of discrete outcomes.
- Example. A hen will peck at grain A with probability
, grain B with probability
, and grain C with probability
.
- Parameters. The distribution is parametrized by the probabilities assigned to each event. We define
to be the probability assigned to outcome
. The set of
's are the parameters, and are constrained by
- Support. If we index the categories with sequential integers from 1 to N, the distribution is supported for integers 1 to
N
, inclusive. - Probability mass function.
- Usage (with theta length
)
Package Syntax NumPy np.random.choice(len(theta), p=theta)
SciPy scipy.stats.rv_discrete(values=(range(len(theta)), theta)).rvs()
Stan categorical(theta)
- Related distributions.
- Notes.
- This distribution must be manually constructed if you are using the scipy.stats module using
scipy.stats.rv_discrete()
. The categories need to be encoded by an index. For interactive plotting purposes, below, we need to specify a custom PMF and CDF. - To sample out of a Categorical distribution, use
numpy.random.choice()
, specifying the values ofusing the p kwarg.
- This distribution must be manually constructed if you are using the scipy.stats module using
def categorical_pmf(x, θ1, θ2, θ3):
thetas = np.array([θ1, θ2, θ3, 1-θ1-θ2-θ3])
if (thetas < 0).any():
return np.array([np.nan]*len(x))
return thetas[x-1]
def categorical_cdf_indiv(x, thetas):
if x < 1:
return 0
elif x >= 4:
return 1
else:
return np.sum(thetas[:int(x)])
def categorical_cdf(x, θ1, θ2, θ3):
thetas = np.array([θ1, θ2, θ3, 1-θ1-θ2-θ3])
if (thetas < 0).any():
return np.array([np.nan]*len(x))
return np.array([categorical_cdf_indiv(x_val, thetas) for x_val in x])
params = [dict(name='θ1', start=0, end=1, value=0.2, step=0.01),
dict(name='θ2', start=0, end=1, value=0.3, step=0.01),
dict(name='θ3', start=0, end=1, value=0.1, step=0.01)]
app = distribution_plot_app(x_min=1,
x_max=4,
custom_pmf=categorical_pmf,
custom_cdf=categorical_cdf,
params=params,
x_axis_label='category',
title='Discrete categorical')
bokeh.io.show(app, notebook_url=notebook_url)
Discrete Uniform distribution
- Story. A set of discrete outcomes that can be indexed with sequential integers each has equal probability, like rolling a fair die.
- Example. A monkey can choose from any of
bananas with equal probability.
- Parameters. The distribution is parametrized by the high and low allowed values.
- Support. The Discrete Uniform distribution is supported on the set of integers ranging from
to
, inclusive.
- Probability mass function.
- Usage
Package Syntax NumPy np.random.randint(low, high+1)
SciPy scipy.stats.randint(low, high+1)
Stan categorical(theta)
,theta
array with all equal values
- Related distributions.
- Notes.
- This distribution is not included in Stan. Instead, use a Categorical distribution with equal probailities.
- In SciPy, this distribution is know as
scipy.stats.randint
. The high parameter is not inclusive; i.e., the set of allowed values includes the low parameter, but not the high. The same is true fornumpy.random.randint()
, which is used for sampling out of this distribution.
params = [dict(name='low', start=0, end=10, value=0, step=1),
dict(name='high', start=0, end=10, value=10, step=1)]
app = distribution_plot_app(x_min=0,
x_max=10,
scipy_dist=st.randint,
params=params,
transform=lambda low, high: (low, high+1),
x_axis_label='y',
title='Discrete continuous')
bokeh.io.show(app, notebook_url=notebook_url)
Continuous distributions
Here, we will explain a series of continuous distributions.
Uniform distribution
- Story. Any outcome in a given range has equal probability.
- Example. Anything in which all possibilities are equally likely. This is, perhaps surprisingly, rarely encountered.
- Parameters. The Uniform distribution is not defined on an infinite or semi-infinite domain, so lower and upper bounds,
and
, respectively, are necessary parameters.
- Support. The Uniform distribution is supported on the interval
.
- Probability density function.
- Usage
- Related distributions.
- The Uniform distribution on the interval [0, 1] (i.e.,
and
) is a special case of the Beta distribution where the parameters for the Beta distribution are
(not to be confused with the
and
used to parametrize the Uniform distribution).
Package | Syntax |
---|---|
NumPy | np.random.uniform(alpha, beta) |
SciPy | scipy.stats.uniform(alpha, beta) |
Stan | uniform(alpha, beta) |
params = [dict(name='α', start=-2, end=3, value=0, step=0.01),
dict(name='β', start=-2, end=3, value=1, step=0.01)]
app = distribution_plot_app(x_min=-2,
x_max=3,
scipy_dist=st.uniform,
params=params,
title='Uniform')
bokeh.io.show(app, notebook_url=notebook_url)
Gaussian, a.k.a. Normal, distribution
- Story. Any quantity that emerges as the sum of a large number of subprocesses tends to be Gaussian distributed provided none of the subprocesses is very broadly distributed.
- Example. We measure the length of many C. elegans eggs. The lengths are Gaussian distributed. Many biological measurements, like the height of people, are (approximately) Gaussian distributed. Many processes contribute to setting the length of an egg or the height of a person.
- Parameters. The Gaussian distribution has two parameters, the mean
, which determines the location of its peak, and the standard deviation σ, which is strictly positive (the
limit defines a Dirac delta function) and determines the width of the peak.
- Support. The Gaussian distribution is supported on the set of real numbers.
- Probability density function.
- Usage
Package Syntax NumPy np.random.normal(mu, sigma)
SciPy scipy.stats.norm(mu, sigma)
Stan normal(mu, sigma)
- Related distributions. The Gaussian distribution is a limiting distribution in the sense of the central limit theorem, but also in that many distributions have a Gaussian distribution as a limit. This is seen by formally taking limits of, e.g., the Gamma, Student-t, Binomial distributions, which allows direct comparison of parameters.
- Notes.
- SciPy, NumPy, and Stan all refer to the Gaussian distribution as the Normal distribution.
params = [dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),
dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=-2,
x_max=2,
scipy_dist=st.norm,
params=params,
x_axis_label='y',
title='Gaussian, a.k.a. Normal')
bokeh.io.show(app, notebook_url=notebook_url)
Half-Normal distribution
- Story. The Half-Normal distribution is a Gaussian distribution with zero mean truncated to only have nonzero probability for positive real numbers.
- Parameters. Strictly speaking, the Half-Normal is parametrized by a single positive parameter
. We could, however, translate it so that the truncated Gaussian has a maximum at
and only values greater than or equal to
have nonzero probability.
- Probability density function.
- Support. The Half-Normal distribution is supported on the set
. Usually, we have
, in which case the Half-Normal distribution is supported on the set of nonnegative real numbers.
- Usage
Package Syntax NumPy np.random.normal(mu, sigma)
SciPy scipy.stats.halfnorm(mu, sigma)
Stan real<lower=mu> y; y ~ normal(mu, sigma)
- Notes.
- In Stan, a Half-Normal is defined by putting bounds on the variable and then using a Normal distribution.
- The Half-Normal distibution is a useful prior for nonnegative parameters that should not be too large and may be very close to zero.
params = [dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),
dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=-0.5,
x_max=2,
scipy_dist=st.halfnorm,
params=params,
x_axis_label='y',
title='Half-Normal')
bokeh.io.show(app, notebook_url=notebook_url)
Log-Normal distribution
- Story. If
is Gaussian distributed,
is Log-Normally distributed.
- Example. A measure of fold change in gene expression can be Log-Normally distributed.
- Parameters. As for a Gaussian, there are two parameters, the mean,
, and the variance
.
- Support. The Log-Normal distribution is supported on the set of positive real numbers.
- Probability density function.
- Usage
Package | Syntax |
---|---|
NumPy | np.random.lognormal(mu, sigma) |
SciPy | scipy.stats.lognorm(x, sigma, loc=0, scale=np.exp(mu)) |
Stan | lognormal(mu, sigma) |
- Notes.
- Be careful not to get confused. The Log-Normal distribution describes the distribution of
given that
is Gaussian distributed. It does not describe the distribution of
.
- The way location, scale, and shape parameters work in SciPy for the Log-Normal distribution is confusing. If you want to specify a Log-Normal distribution as we have defined it using scipy.stats, use a shape parameter equal to
, a location parameter of zero, and a scale parameter given by
. For example, to compute the PDF, you would use
scipy.stats.lognorm(x, sigma, loc=0, scale=np.exp(mu))
. - The definition of the Log_Normal in the
numpy.random
module matches what we have defined above and what is defined in Stan.
- The way location, scale, and shape parameters work in SciPy for the Log-Normal distribution is confusing. If you want to specify a Log-Normal distribution as we have defined it using scipy.stats, use a shape parameter equal to
params = [dict(name='µ', start=0.01, end=0.5, value=0.1, step=0.01),
dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=4,
scipy_dist=st.lognorm,
params=params,
transform=lambda mu, sigma: (sigma, 0, np.exp(mu)),
x_axis_label='y',
title='Log-Normal')
bokeh.io.show(app, notebook_url=notebook_url)
Chi-square distribution
- Story. If
,
, …,
are Gaussian distributed,
is
-distributed. See also the story of the Gamma distribution, below.
- Example. The sample variance of
independent and identically distributed Gaussian random variables, after scaling, is Chi-square distributed. This is the most common use case of the Chi-square distribution.
- Parameters. There is only one parameter, the degrees of freedom
.
- Support. The Chi-square distribution is supported on the positive real numbers.
- Probability density function.
- Usage
- Related distributions. The Chi-square distribution is a special case of the Gamma distribution with
and
.
Package | Syntax |
---|---|
NumPy | np.random.chisquare(nu) |
SciPy | scipy.stats.chi2(nu) |
Stan | chi_square(nu) |
params = [dict(name='ν', start=1, end=20, value=10, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=40,
scipy_dist=st.chi2,
params=params,
x_axis_label='y',
title='Chi-square')
bokeh.io.show(app, notebook_url=notebook_url)
Student-t distribution
- Story. The story of the Student-t distribution largely derives from its relationships with other distributions. One way to think about it is as a Gaussian-like distribution with heavier tails.
- Parameters. The Student-t distribution is peaked, and its peak is located at
. The peak's width is dictated by parameter σ. Finally, we define the "degrees of freedom" as
. This last parameter imparts the distribution with heavy tails.
- Support. The Student-t distribution is supported on the set of real numbers.
- Probability density function.
- Usage
- Related distributions.
- In the
limit, the Student-t distribution becomes as Gaussian distribution.
- The Cauchy distibution is a special case of the Student-t distribution in which
.
- Only the standard Student-t distribution (
and
) is available in the numpy.random module. You can still draw out of the Student-t distribution by performing a transformation on the samples out of the standard Student-t distribution, as shown in the usage, above.
- We get this distribution whenever we marginalize an unknown
out of a Gaussian distribution with an improper prior on
of
.
- In the
Package | Syntax |
---|---|
NumPy | mu + sigma * np.random.standard_t(nu) |
SciPy | scipy.stats.t(nu, mu, sigma) |
Stan | student_t(nu, mu, sigma) |
params = [dict(name='ν', start=1, end=50, value=10, step=0.01),
dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),
dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=-2,
x_max=2,
scipy_dist=st.t,
params=params,
x_axis_label='y',
title='Student-t')
bokeh.io.show(app, notebook_url=notebook_url)
Cauchy distribution
- Story. The intercept on the x-axis of a beam of light coming from the point
is Cauchy distributed. This story is popular in physics (and is one of the first examples of Bayesian inference in Sivia's book), but is not particularly useful. You can think of it as a peaked distribution with enormously heavy tails.
- Parameters. The Cauchy distribution is peaked, and its peak is located at
. The peak's width is dictated by parameter
.
- Support. The Cauchy distribution is supported on the set of real numbers.
- Probability density function.
- Usage
Package Syntax NumPy mu + sigma * np.random.standard_cauchy()
SciPy scipy.stats.cauchy(mu, sigma)
Stan cauchy(mu, sigma)
- Related distributions.
- The Cauchy distribution is a special case of the Student-t distribution in which the degrees of freedom
.
- The
numpy.random
module only has the Standard Cauchy distribution (and
), but you can draw out of a Cauchy distribution using the transformation shown in the NumPy usage above.
params = [dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),
dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=-2,
x_max=2,
scipy_dist=st.cauchy,
params=params,
x_axis_label='y',
title='Cauchy')
bokeh.io.show(app, notebook_url=notebook_url)
Exponential distribution
- Story. This is the waiting time for an arrival from a Poisson process. I.e., the inter-arrival time of a Poisson process is Exponentially distributed.
- Example. The time between conformational switches in a protein is Exponentially distributed (under simple mass action kinetics).
- Parameter. The single parameter is the average arrival rate,
. Alternatively, we can use
as the parameter, in this case a characteristic arrival time.
- Support. The Exponential distribution is supported on the set of nonnegative real numbers.
- Probability density function.
- Related distributions.
- The Exponential distribution is the continuous analog of the Geometric distribution. The "rate" in the Exponential distribution is analogous to the probability of success of the Bernoulli trial. Note also that because they are uncorrelated, the amount of time between any two arrivals is independent of all other inter-arrival times.
- The Exponential distribution is a special case of the Gamma distribution with parameter
.
- Usage
- Notes.
- Alternatively, we could parametrize the Exponential distribution in terms of an average time between arrivals of a Poisson process,
, as
- The implementation in the
scipy.stats
module also has a location parameter, which shifts the distribution left and right. For our purposes, you can ignore that parameter, but be aware thatscipy.stats
requires it. Thescipy.stats
Exponential distribution is parametrized in terms of the interarrival time,, and not
.
- The
numpy.random.exponential()
function does not need nor accept a location parameter. It is also parametrized in terms of τ.
Package | Syntax |
---|---|
NumPy | np.random.exponential(1/beta) |
SciPy | scipy.stats.expon(loc=0, scale=1/beta) |
Stan | exponential(beta) |
params = [dict(name='β', start=0.1, end=1, value=0.25, step=0.01)]
app = distribution_plot_app(0,
30,
st.expon,
params=params,
transform=lambda x: (0, 1/x),
x_axis_label='y',
title='Exponential')
bokeh.io.show(app, notebook_url=notebook_url)
Gamma distribution
- Story. The amount of time we have to wait for
arrivals of a Poisson process. More concretely, if we have events,
,
, …,
that are exponentially distributed,
is Gamma distributed.
- Example. Any multistep process where each step happens at the same rate. This is common in molecular rearrangements.
- Parameters. The number of arrivals,
, and the rate of arrivals,
.
- Support. The Gamma distribution is supported on the set of positive real numbers.
- Probability density function.
- Related distributions.
- Usage
Package Syntax NumPy np.random.gamma(alpha, beta)
SciPy scipy.stats.gamma(alpha, loc=0, scale=beta)
Stan gamma(alpha, beta)
- Notes.
- The Gamma distribution is useful as a prior for positive parameters. It imparts a heavier tail than the Half-Normal distribution (but not too heavy; it keeps parameters from growing too large), and allows the parameter value to come close to zero.
- SciPy has a location parameter, which should be set to zero, with
being the scale parameter.
params = [dict(name='α', start=1, end=5, value=2, step=0.01),
dict(name='β', start=0.1, end=5, value=2, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=50,
scipy_dist=st.gamma,
params=params,
transform=lambda a, b: (a, 0, b),
x_axis_label='y',
title='Gamma')
bokeh.io.show(app, notebook_url=notebook_url)
Inverse Gamma distribution
- Story. If
is Gamma distributed, then
is Inverse Gamma distributed.
- Parameters. The number of arrivals,
, and the rate of arrivals,
.
- Support. The Inverse Gamma distribution is supported on the set of positive real numbers.
- Probability density function.
- Usage
- Notes.
- The Inverse Gamma distribution is useful as a prior for positive parmeters. It imparts a quite heavy tail and keeps probability further from zero than the Gamma distribution.
- The
numpy.random
module does not have a function to sample directly from the Inverse Gamma distribution, but it can be achieved by sampling out of a Gamma distribution as shown in the NumPy usage above.
Package | Syntax |
---|---|
NumPy | 1 / np.random.gamma(alpha, 1/beta) |
SciPy | `scipy.stats.invgamma(alpha, loc=0, scale=beta) |
Stan | inv_gamma(alpha, beta) |
params = [dict(name='α', start=0.01, end=2, value=0.5, step=0.01),
dict(name='β', start=0.1, end=2, value=1, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=20,
scipy_dist=st.invgamma,
params=params,
transform=lambda alpha, beta: (alpha, 0, beta),
x_axis_label='y',
title='Inverse Gamma')
bokeh.io.show(app, notebook_url=notebook_url)
Weibull distribution
- Story. Distribution of
if
is exponentially distributed. For
, the longer we have waited, the more likely the event is to come, and vice versa for
.
- Example. This is a model for aging. The longer an organism lives, the more likely it is to die.
- Parameters. There are two parameters, both strictly positive: the shape parameter
, which dictates the shape of the curve, and the scale parameter
, which dictates the rate of arrivals of the event.
- Support. The Weibull distribution has support on the positive real numbers.
- Probability density function.
- Usage
Package Syntax NumPy np.random.weibull(alpha) * sigma
SciPy `scipy.stats.weibull_min(alpha, loc=0, scale=sigma) Stan weibull(alpha, sigma)
- Related distributions.
- Notes.
params = [dict(name='α', start=0.1, end=5, value=1, step=0.01),
dict(name='σ', start=0.1, end=3, value=1.5, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=8,
scipy_dist=st.weibull_min,
params=params,
transform=lambda a, s: (a, 0, s),
x_axis_label='y',
title='Weibull')
bokeh.io.show(app, notebook_url=notebook_url)
Beta distribution
- Story. Say you wait for two multistep processes to happen. The individual steps of each process happen at the same rate, but the first multistep process requires
steps and the second requires
steps. The fraction of the total waiting time take by the first process is Beta distributed.
- Example.
- Parameters. There are two parameters, both strictly positive:
and
, defined in the above story.
- Support. The Beta distribution has support on the interval [0, 1].
- Probability density function.
where
is the Beta function. - Usage
Package Syntax NumPy np.random.beta(alpha, beta)
SciPy scipy.stats.beta(alpha, beta)
Stan weibull(alpha, sigma)
- Related distributions.
- Notes.
- The story of the Beta distribution is difficult to parse. Most importantly for our purposes, the Beta distribution allows us to put probabilities on unknown probabilities. It is only defined on
, and
here can be interpreted as a probability, say of a Bernoulli trial.
- The case where
is not technically a probability distribution because the PDF cannot be normalized. Nonetheless, it can be used as an improper prior, and this prior is known a Haldane prior, names after biologist J. B. S. Haldane. The case where
is sometimes called a Jeffreys prior.
- The story of the Beta distribution is difficult to parse. Most importantly for our purposes, the Beta distribution allows us to put probabilities on unknown probabilities. It is only defined on
params = [dict(name='α', start=0.01, end=10, value=1, step=0.01),
dict(name='β', start=0.01, end=10, value=1, step=0.01)]
app = distribution_plot_app(x_min=0,
x_max=1,
scipy_dist=st.beta,
params=params,
x_axis_label='θ',
title='Beta')
bokeh.io.show(app, notebook_url=notebook_url)
Discrete multivariate distributions
So far, we have looked a univariate distributions, but we will consider multivariate distributions in class, and you will encounter them in your research. First, we consider a discrete multivariate distribution, the Multinomial.
Multinomial distribution
- Story. This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has
outcomes. The probability of getting
of outcome 1,
of outcome 2, ..., and
of outcome
out of a total of
trials is Multinomially distributed.
- Example. There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having
AA individuals,
Aa individuals, and
aa individuals in a population of
total individuals is Multinomially distributed.
- Parameters.
, the total number of trials, and
, the probabilities of each outcome. Note that
and there is a further restriction that
.
- Support. The K-nomial distribution is supported on
.
- Usage The usage below assumes that theta is a length K array.
Package Syntax NumPy np.random.multinomial(N, theta)
SciPy scipy.stats.multinomial(N, theta)
Stan sampling multinomial(theta)
Stan rng multinomial_rng(theta, N)
- Probability density function.
- Related distributions.
- The Multinomial distribution generalizes the Binomial distribution to multiple dimensions.
- Notes.
- For a sampling statement in Stan, the value of N is implied
Continuous Multivariate distributions
We will consider two continuous multivariate distributions here, the multivariate Gaussian and the Dirichlet. Generally plotting multivariate PDFs is difficult, but bivariate PDFs may be conveniently plotted as contour plots. In our investigation of the multivariate Gaussian distribution, I will also demonstrate how to make plots of bivariate PDFs.
Multivariate Gaussian, a.k.a. Multivariate Normal, distribution
- Story. This is a generalization of the univariate Gaussian.
- Example. Finch beaks are measured for beak depth and beak length. The resulting distribution of depths and length is Gaussian distributed. In this case, the Gaussian is bivariate, with
and
.
- Parameters. There is one vector-valued parameter,
, and a matrix-valued parameter,
, referred to respectively as the mean and covariance matrix. The covariance matrix is symmetric and strictly positive definite.
- Support. The K-variate Gaussian distribution is supported on
.
- Probability density function.
- Usage The usage below assumes that mu is a length K array, Sigma is a K×K symmetric positive definite matrix, and L is a K×K lower-triangular matrix with strictly positive values on teh diagonal that is a Cholesky factor.
- Related distributions.
- The Multivariate Gaussian is a generalization of the univariate Gaussian.
- Notes.
- The covariance matrix may also be written as
, where
,
and entry,
in the correlation matrix C is
Furthermore, becauseis symmetric and strictly positive definite, it can be uniquely defined in terms of its Cholesky decomposition,
, which satisfies
. In practice, you will almost always use the Cholesky representation of the Multivariate Gaussian distribution in Stan.
Package | Syntax |
---|---|
NumPy | np.random.multivariate_normal(mu, Sigma) |
SciPy | scipy.stats.multivariate_normal(mu, Sigma) |
Stan | multi_normal(mu, Sigma) |
NumPy Cholesky | np.random.multivariate_normal(mu, np.dot(L, L.T)) |
SciPy Cholesky | scipy.stats.multivariate_normal(mu, np.dot(L, L.T)) |
Stan Cholesky | multi_normal_cholesky(mu, L) |
Lewandowski-Kurowicka-Joe (LKJ) distribution
- Story. Probability distribution for positive definite correlation matrices, or equivalanently for their Cholesky factors (which is what we use in practice).
- Parameter. There is one positive scalar parameter,
, which tunes the strength of the correlations. If
, the density is uniform over all correlation matrix. If
, matrices with a stronger diagonal (and therefore smaller correlations) are favored. If
, the diagonal is weak and correlations are favored.
- Support. The LKJ distribution is supported over the set of
Cholesky factors of real symmetric positive definite matrices.
- Probability density function. We cannot write the PDF in closed form.
- Usage
Package | Syntax |
---|---|
NumPy | not available |
SciPy | not available |
Stan | lkj_corr_cholesky(eta) |
- Notes.
- The most common use case is as a prior for a covariance matrix. Note that LKJ distribution gives Cholesky factors for correlation matrices, not covariance matrices. To get the covariance Cholesky factor from the correlation Cholesky factor, you need to multiply the correlation Cholesky factor by a diagonal constructed from the variances of the individual variates. Here is an example for a multivariate Gaussian in Stan.
parameters {<br> // Vector of means<br> vector[K] mu;<br><br> // Cholesky factor for the correlation matrix<br> cholesky_factor_corr[K] L_Omega;<br><br> // Sqrt of variances for each variate<br> vector<lower=0>[K] L_std;<br> }
model {<br> // Cholesky factor for covariance matrix<br> L_Sigma = diag_pre_multiply(L_std, L_Omega);<br><br> // Prior on Cholesky decomposition of correlation matrix<br> L_Omega ~ lkj_corr_cholesky(1);<br><br> // Prior on standard deviations for each variate<br> L_std ~ normal(0, 2.5);<br><br> // Likelihood<br> y ~ multi_normal_cholesky(mu, L_Sigma);<br> }
Dirichlet distribution
- Story. The Dirichlet distribution is a generalization of the Beta distribution. It is a probability distribution describing probabilities of outcomes. Instead of describing probability of one of two outcomes of a Bernoulli trial, like the Beta distribution does, it describes probability of
of
outcomes. The Beta distribution is the special case of
.
- Parameters. The parameters are
, all strictly positive, defined analogously to
and
of the Beta distribution.
- Support. The Dirichlet distribution has support on the interval [0, 1] such that
.
- Probability density function.
where
is the multivariate Beta function. - Related distributions.
Conclusions
This document serves as a catalog of probability distributions that will be useful for you in your statistical modeling. As we will see, the mathematical expression of the PDF is not often needed. What is most important in your modeling is that you know the story of the distribution.
Computing environment
%load_ext watermark
%watermark -v -p numpy,scipy,bokeh,jupyterlab
CPython 3.6.6 IPython 6.5.0 numpy 1.15.1 scipy 1.1.0 bokeh 0.13.0 jupyterlab 0.34.9