Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Continuous distributions

Exponential distribution

  • Story. This is the waiting time for an arrival from a Poisson process. I.e., the inter-arrival time of a Poisson process is Exponentially distributed.
  • Example. The time between conformational switches in a protein is Exponentially distributed (under simple mass action kinetics).
  • Parameter. The single parameter is the average arrival rate, β. Alternatively, we can use τ=1/r as the parameter, in this case a characteristic arrival time.
  • Support. The Exponential distribution is supported on the set of nonnegative real numbers.
  • Probability density function.

    f(y;β)=βe^{−βy}

  • Related distributions.
    • The Exponential distribution is the continuous analog of the Geometric distribution. The "rate" in the Exponential distribution is analogous to the probability of success of the Bernoulli trial. Note also that because they are uncorrelated, the amount of time between any two arrivals is independent of all other inter-arrival times.
    • The Exponential distribution is a special case of the Gamma distribution with parameter α=1.
  • Usage

  • Package Syntax
    NumPy np.random.exponential(1/beta)
    SciPy scipy.stats.expon(loc=0, scale=1/beta)
    Stan exponential(beta)

  • Notes.
  • Alternatively, we could parametrize the Exponential distribution in terms of an average time between arrivals of a Poisson process, τ, as

    f(y;τ)=\dfrac{1}{τ}e^{−y/τ}

  • The implementation in the scipy.stats module also has a location parameter, which shifts the distribution left and right. For our purposes, you can ignore that parameter, but be aware that scipy.stats requires it. The scipy.stats Exponential distribution is parametrized in terms of the interarrival time, τ, and not β.
  • The numpy.random.exponential() function does not need nor accept a location parameter. It is also parametrized in terms of τ.
params = [dict(name='β', start=0.1, end=1, value=0.25, step=0.01)]
app = distribution_plot_app(0,
                            30,
                            st.expon,
                            params=params,
                            transform=lambda x: (0, 1/x),
                            x_axis_label='y',
                            title='Exponential')
bokeh.io.show(app, notebook_url=notebook_url)