Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Discrete distributions

Poisson distribution

  • Story. Rare events occur with a rate λ per unit time. There is no "memory" of previous events; i.e., that rate is independent of time. A process that generates such events is called a Poisson process. The occurrence of a rare event in this context is referred to as an arrival. The number n of arrivals in unit time is Poisson distributed.
  • Example. The number of mutations in a strand of DNA per unit length (since mutations are rare) are Poisson distributed.
  • Parameter. The single parameter is the rate λ of the rare events occurring.
  • Support. The Poisson distribution is supported on the set of nonnegative integers.
  • Probability mass function.

    \begin{align}f(n;\lambda) = \frac{\lambda^n}{n!}\,\mathrm{e}^{-\lambda}\end{align}.

  • Usage

  • Package Syntax
    NumPy np.random.poisson(lam)
    SciPy scipy.stats.poisson(lam)
    Stan poisson(lam)


  • Related distributions.
    • In the limit of N→∞ and θ→0 such that the quantity Nθ is fixed, the Binomial distribution becomes a Poisson distribution with parameter Nθ. Thus, for large N and small θ,
    \begin{align}\\ \phantom{blah}f_\mathrm{Poisson}(n;\lambda) \approx f_\mathrm{Binomial}(n;N, \theta)\\ \phantom{blah}\end{align},
    with λ=Nθ. Considering the biological example of mutations, this is Binomially distributed: There are N bases, each with a probability θ of mutation, so the number of mutations, n is binomially distributed. Since θ is small and N is large, it is approximately Poisson distributed.

    • Under the (μ,ϕ) parametrization of the Negative Binomial distribution, taking the limit of large ϕ yields the Poisson distribution.
params = [dict(name='λ', start=1, end=20, value=5, step=0.1)] 
app = distribution_plot_app(x_min=0,
                            x_max=40,
                            scipy_dist=st.poisson,
                           params=params,
                           x_axis_label='n',
                           title='Poisson')
bokeh.io.show(app, notebook_url=notebook_url)