Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Discrete distributions

Geometric distribution

  • Story. We perform a series of Bernoulli trials with probability of success θ until we get a success. The number of failures y before the success is Geometrically distributed.
  • Example. Consider actin polymerization. At each time step, an actin monomer may add to the filament ("failure"), or an actin monomer may fall off ("success") with (usually very low) probability θ. The length of actin filaments are Geometrically distributed.
  • Parameter. The Geometric distribution is parametrized by a single value, θ, the probability that the Bernoulli trial is successful.
  • Support. The Geometric distribution, as defined here, is has support on the nonnegative integers.
  • Probability mass function.
f(y;θ)=(1−θ)^y θ.

  • Usage
Package Syntax
NumPy np.random.geometric(theta)
SciPy scipy.stats.geom(theta, loc=-1)
Stan neg_binomial(1, theta/(1-theta))

  • Related distributions.
    • The Geometric distribution is a special case of the Negative Binomial distribution in which α=1 and θ=β/(1+β).
    • The continuous analog of the Geometric distribution is the Exponential distribution.
  • Notes.
    • The Geometric distribution is supported on non-negative integer y.
  • The Geometric distribution is not implemented in Stan. You can instead use a Negative Binomial distribution fixing the parameter α to be unity and relating the parameter β of the Negative Binomial distribution to θ as θ=β/(1+β).
  • The Geometric distribution is defined differently in SciPy, instead replacing y with y−1. In SciPy's parametrization the Geometric distribution describes the number of successive Bernoulli trials (not just the failures; the success is included) necessary to get a success. To adjust for this, we use the loc=-1 kwarg.
params = [dict(name='θ', start=0, end=1, value=0.5, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=20,
                            scipy_dist=st.geom,
                            params=params,
                            transform=lambda theta: (theta, -1),
                            x_axis_label='y',
                            title='Geometric')
bokeh.io.show(app, notebook_url=notebook_url)