Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Continuous distributions

Student-t distribution

  • Story. The story of the Student-t distribution largely derives from its relationships with other distributions. One way to think about it is as a Gaussian-like distribution with heavier tails.
  • Parameters. The Student-t distribution is peaked, and its peak is located at μ. The peak's width is dictated by parameter σ. Finally, we define the "degrees of freedom" as ν. This last parameter imparts the distribution with heavy tails.
  • Support. The Student-t distribution is supported on the set of real numbers.
  • Probability density function.
    \begin{align}f(y;\mu, \sigma, \nu) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)\sqrt{\pi \nu \sigma^2}}\,\left(1 + \frac{(y-\mu)^2}{\nu\sigma^2}\right)^{-\frac{\nu+1}{2}}\end{align}

  • Usage

  • Package Syntax
    NumPy mu + sigma * np.random.standard_t(nu)
    SciPy scipy.stats.t(nu, mu, sigma)
    Stan student_t(nu, mu, sigma)

  • Related distributions.
    • In the n→∞ limit, the Student-t distribution becomes as Gaussian distribution.
    • The Cauchy distibution is a special case of the Student-t distribution in which ν=1.
    • Only the standard Student-t distribution (μ=0 and σ=1) is available in the numpy.random module. You can still draw out of the Student-t distribution by performing a transformation on the samples out of the standard Student-t distribution, as shown in the usage, above.
    • We get this distribution whenever we marginalize an unknown σ out of a Gaussian distribution with an improper prior on σ of 1/σ.
params = [dict(name='ν', start=1, end=50, value=10, step=0.01),
          dict(name='µ', start=-0.5, end=0.5, value=0, step=0.01),
          dict(name='σ', start=0.1, end=1.0, value=0.2, step=0.01)]
app = distribution_plot_app(x_min=-2,
                            x_max=2,
                            scipy_dist=st.t,
                            params=params,  
                            x_axis_label='y',
                            title='Student-t')
bokeh.io.show(app, notebook_url=notebook_url)