Completion requirements
Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.
Discrete distributions
Poisson distribution
- Story. Rare events occur with a rate \(λ\) per unit time. There is no "memory" of previous events; i.e., that rate is independent of time. A process that generates such events is called a Poisson process. The occurrence of a rare event in this context is referred to as an arrival. The number \(n\) of arrivals in unit time is Poisson distributed.
- Example. The number of mutations in a strand of DNA per unit length (since mutations are rare) are Poisson distributed.
- Parameter. The single parameter is the rate \(λ\) of the rare events occurring.
- Support. The Poisson distribution is supported on the set of nonnegative integers.
- Probability mass function.
\(\begin{align}
f(n;\lambda) = \frac{\lambda^n}{n!}\,\mathrm{e}^{-\lambda}
\end{align}\).
- Usage
- Related distributions.
- In the limit of \(N→∞\) and \(θ→0\) such that the quantity \(Nθ\) is fixed, the Binomial distribution becomes a Poisson distribution with parameter \(Nθ\). Thus, for large \(N\) and small \(θ\),
\(\begin{align}with \(λ=Nθ\). Considering the biological example of mutations, this is Binomially distributed: There are \(N\) bases, each with a probability \(θ\) of mutation, so the number of mutations, n is binomially distributed. Since \(θ\) is small and \(N\) is large, it is approximately Poisson distributed.
\\ \phantom{blah}
f_\mathrm{Poisson}(n;\lambda) \approx f_\mathrm{Binomial}(n;N, \theta)
\\ \phantom{blah}
\end{align}\),
- Under the \((μ,ϕ)\) parametrization of the Negative Binomial distribution, taking the limit of large \(ϕ\) yields the Poisson distribution.
Package | Syntax |
---|---|
NumPy | np.random.poisson(lam) |
SciPy | scipy.stats.poisson(lam) |
Stan | poisson(lam) |
params = [dict(name='λ', start=1, end=20, value=5, step=0.1)]
app = distribution_plot_app(x_min=0,
x_max=40,
scipy_dist=st.poisson,
params=params,
x_axis_label='n',
title='Poisson')
bokeh.io.show(app, notebook_url=notebook_url)