Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Continuous distributions

Gamma distribution

  • Story. The amount of time we have to wait for α arrivals of a Poisson process. More concretely, if we have events, X_1, X_2, …, X_α that are exponentially distributed, X_1+X_2+⋯+X_α is Gamma distributed.
  • Example. Any multistep process where each step happens at the same rate. This is common in molecular rearrangements.
  • Parameters. The number of arrivals, α, and the rate of arrivals, β.
  • Support. The Gamma distribution is supported on the set of positive real numbers.
  • Probability density function.

    \begin{align}f(y;\alpha, \beta) = \frac{1}{\Gamma(\alpha)}\,\frac{(\beta y)^\alpha}{y}\,\mathrm{e}^{-\beta y}\end{align}

  • Related distributions.
    • The Gamma distribution is the continuous analog of the Negative Binomial distribution.
    • The special case where α=1 is an Exponential distribution.
    • The special case where α=ν/2 and β=1/2 is a Chi-square distribution parametrized by ν.
  • Usage

    Package Syntax
    NumPy np.random.gamma(alpha, beta)
    SciPy scipy.stats.gamma(alpha, loc=0, scale=beta)
    Stan gamma(alpha, beta)

  • Notes.
    • The Gamma distribution is useful as a prior for positive parameters. It imparts a heavier tail than the Half-Normal distribution (but not too heavy; it keeps parameters from growing too large), and allows the parameter value to come close to zero.
    • SciPy has a location parameter, which should be set to zero, with β being the scale parameter.
params = [dict(name='α', start=1, end=5, value=2, step=0.01),
          dict(name='β', start=0.1, end=5, value=2, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=50,
                            scipy_dist=st.gamma,
                            params=params,
                            transform=lambda a, b: (a, 0, b),
                            x_axis_label='y',
                            title='Gamma')
bokeh.io.show(app, notebook_url=notebook_url)