Completion requirements
Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.
Discrete distributions
Poisson distribution
- Story. Rare events occur with a rate
per unit time. There is no "memory" of previous events; i.e., that rate is independent of time. A process that generates such events is called a Poisson process. The occurrence of a rare event in this context is referred to as an arrival. The number
of arrivals in unit time is Poisson distributed.
- Example. The number of mutations in a strand of DNA per unit length (since mutations are rare) are Poisson distributed.
- Parameter. The single parameter is the rate
of the rare events occurring.
- Support. The Poisson distribution is supported on the set of nonnegative integers.
- Probability mass function.
- Usage
- Related distributions.
- In the limit of
and
such that the quantity
is fixed, the Binomial distribution becomes a Poisson distribution with parameter
. Thus, for large
and small
,
. Considering the biological example of mutations, this is Binomially distributed: There are
bases, each with a probability
of mutation, so the number of mutations, n is binomially distributed. Since
is small and
is large, it is approximately Poisson distributed.
- In the limit of
Package | Syntax |
---|---|
NumPy | np.random.poisson(lam) |
SciPy | scipy.stats.poisson(lam) |
Stan | poisson(lam) |
params = [dict(name='λ', start=1, end=20, value=5, step=0.1)]
app = distribution_plot_app(x_min=0,
x_max=40,
scipy_dist=st.poisson,
params=params,
x_axis_label='n',
title='Poisson')
bokeh.io.show(app, notebook_url=notebook_url)