Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Discrete distributions

Bernoulli distribution

  • Story. A Bernoulli trial is an experiment that has two outcomes that can be encoded as success (y=1) or failure (y=0). The result y of a Bernoulli trial is Bernoulli distributed.
  • Example. Check to see if a given bacterium is competent, given that it has probability θ of being competent.
  • Parameter. The Bernoulli distribution is parametrized by a single value, θ, the probability that the trial is successful.
  • Support. The Bernoulli distribution may be nonzero only for zero and one.
  • Probability mass function.

    \begin{align}f(y;\theta) = \left\{ \begin{array}{ccc}1-\theta & & y = 0 \\[0.5em]\theta & & y = 1.\end{array}\right.\end{align}

  • Usage

  • Package Syntax
    NumPy np.random.choice([0, 1], p=[1-theta, theta])
    SciPy scipy.stats.bernoulli(theta)
    Stan bernoulli(theta)

  • Related distributions.
    • The Bernoulli distribution is a special case of the Binomial distribution with N=1.
params = [dict(name='θ', start=0, end=1, value=0.5, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=1,
                            scipy_dist=st.bernoulli,
                            params=params,
                            x_axis_label='y',
                            title='Bernoulli')
bokeh.io.show(app, notebook_url=notebook_url)