Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Continuous distributions

Log-Normal distribution

  • Story. If ln y is Gaussian distributed, y is Log-Normally distributed.
  • Example. A measure of fold change in gene expression can be Log-Normally distributed.
  • Parameters. As for a Gaussian, there are two parameters, the mean, μ, and the variance σ^2.
  • Support. The Log-Normal distribution is supported on the set of positive real numbers.
  • Probability density function.

    \begin{align}f(y;\mu, \sigma) = \frac{1}{y\sqrt{2\pi \sigma^2}}\,\mathrm{e}^{-(\ln y - \mu)^2/2\sigma^2}\end{align}

  • Usage

  • Package Syntax
    NumPy np.random.lognormal(mu, sigma)
    SciPy scipy.stats.lognorm(x, sigma, loc=0, scale=np.exp(mu))
    Stan lognormal(mu, sigma)

  • Notes.
  • Be careful not to get confused. The Log-Normal distribution describes the distribution of y given that ln y is Gaussian distributed. It does not describe the distribution of ln y.
    • The way location, scale, and shape parameters work in SciPy for the Log-Normal distribution is confusing. If you want to specify a Log-Normal distribution as we have defined it using scipy.stats, use a shape parameter equal to σ, a location parameter of zero, and a scale parameter given by e^μ. For example, to compute the PDF, you would use scipy.stats.lognorm(x, sigma, loc=0, scale=np.exp(mu)).
    • The definition of the Log_Normal in the numpy.random module matches what we have defined above and what is defined in Stan.
params = [dict(name='µ', start=0.01, end=0.5, value=0.1, step=0.01),
          dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=4,
                            scipy_dist=st.lognorm,
                            params=params,
                            transform=lambda mu, sigma: (sigma, 0, np.exp(mu)),
                            x_axis_label='y',
                            title='Log-Normal')

bokeh.io.show(app, notebook_url=notebook_url)