Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Discrete distributions

Discrete Uniform distribution

  • Story. A set of discrete outcomes that can be indexed with sequential integers each has equal probability, like rolling a fair die.
  • Example. A monkey can choose from any of n bananas with equal probability.
  • Parameters. The distribution is parametrized by the high and low allowed values.
  • Support. The Discrete Uniform distribution is supported on the set of integers ranging from y_{low} to y_{high}, inclusive.
  • Probability mass function.

    \begin{align}f(y;y_\mathrm{low}, y_\mathrm{high}) = \frac{1}{y_\mathrm{high} - y_\mathrm{low} + 1}\end{align}

  • Usage

    Package Syntax
    NumPy np.random.randint(low, high+1)
    SciPy scipy.stats.randint(low, high+1)
    Stan categorical(theta), theta array with all equal values

  • Related distributions.
    • The Discrete Uniform distribution is a special case of the Categorical distribution where all θ_y are equal.
  • Notes.
    • This distribution is not included in Stan. Instead, use a Categorical distribution with equal probailities.
    • In SciPy, this distribution is know as scipy.stats.randint. The high parameter is not inclusive; i.e., the set of allowed values includes the low parameter, but not the high. The same is true for numpy.random.randint(), which is used for sampling out of this distribution.
params = [dict(name='low', start=0, end=10, value=0, step=1), 
          dict(name='high', start=0, end=10, value=10, step=1)]
app = distribution_plot_app(x_min=0,
                            x_max=10,
                            scipy_dist=st.randint,
                            params=params,
                            transform=lambda low, high: (low, high+1),
                            x_axis_label='y',
                            title='Discrete continuous')
bokeh.io.show(app, notebook_url=notebook_url)