Probability Distributions and their Stories

Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.

Continuous distributions

Chi-square distribution

  • Story. If X_1, X_2, …, X_n are Gaussian distributed, X_1^2 + X_2^2 + \cdots + X_n^2 is χ^2 -distributed. See also the story of the Gamma distribution, below.
  • Example. The sample variance of ν−1 independent and identically distributed Gaussian random variables, after scaling, is Chi-square distributed. This is the most common use case of the Chi-square distribution.
  • Parameters. There is only one parameter, the degrees of freedom ν.
  • Support. The Chi-square distribution is supported on the positive real numbers.
  • Probability density function.
    \begin{align}f(y;\nu) \equiv \chi^2_n(x;\nu) = \frac{1}{2^{\nu/2}\,\Gamma\left(\frac{\nu}{2}\right)}\,x^{\frac{\nu}{2}-1}\,\mathrm{e}^{-y/2}\end{align}
  • Usage

  • Package Syntax
    NumPy np.random.chisquare(nu)
    SciPy scipy.stats.chi2(nu)
    Stan chi_square(nu)

  • Related distributions. The Chi-square distribution is a special case of the Gamma distribution with α=ν/2 and β=1/2.
params = [dict(name='ν', start=1, end=20, value=10, step=0.01)]
app = distribution_plot_app(x_min=0,
                            x_max=40,
                            scipy_dist=st.chi2,
                            params=params,
                            x_axis_label='y',
                            title='Chi-square')
bokeh.io.show(app, notebook_url=notebook_url)