Probability Distributions and their Stories

Discrete distributions

Hypergeometric distribution

Story. Consider an urn with a white balls and b black balls. Draw N balls from this urn without replacement. The number white balls drawn, n, is Hypergeometrically distributed.
  • Example. There are a+b finches on an island, and a of them are tagged (and therefore b of them are untagged). You capture N finches. The number of tagged finches n is Hypergeometrically distributed, f(n;N,a,b), as defined below.
  • Parameters. There are three parameters: the number of draws N, the number of white balls a, and the number of black balls b.
  • Support. The Hypergeometric distribution is supported on the set of integers between max(0,N−b) and min(N,a), inclusive.
  • Probability mass function.

    \begin{align}f(n;N, a, b) = \frac{\begin{pmatrix}a\\n\end{pmatrix}\begin{pmatrix}b\\N-n\end{pmatrix}}{\begin{pmatrix}a+b\\N\end{pmatrix}}\end{align}.

  • Usage

    Package Syntax
    NumPy np.random.hypergeometric(a, b, N)
    SciPy scipy.stats.hypergeom(a+b, a, N)
    Stan hypergeometric(N, a, b)

  • Related distributions.
    • In the limit of a+b→∞ such that a/(a+b) is fixed, we get a Binomial distribution with parameters N=N and θ=a/(a+b).
  • Notes.
  • This distribution is analogous to the Binomial distribution, except that the Binomial distribution describes draws from an urn with replacement. In the analogy, p=a/(a+b).
  • SciPy uses a different parametrization. Let M=a+b be the total number of balls in the urn. Then, noting the order of the parameters, since this is what scipy.stats.hypergeom expects,

    \begin{align}\\ \phantom{blah}f(n;M, a, N) = \frac{\begin{pmatrix}a\\n\end{pmatrix}\begin{pmatrix}M-a\\N-n\end{pmatrix}}{\begin{pmatrix}M\\n\end{pmatrix}}.  \\ \phantom{blah}\end{align}

  • The random number generator in numpy.random has a different parametrization than in the scipy.stats module. The numpy.random.hypergeom() function uses the same parametrization as Stan, except the parameters are given in the order a, b, N, not N, a, b, as in Stan.
  • When using the sliders below, you will only get a plot if N ≤ a+b because you cannot draw more balls out of the urn than are actually in there.
params = [dict(name='N', start=1, end=20, value=10, step=1),
          dict(name='a', start=1, end=20, value=10, step=1),
          dict(name='b', start=1, end=20, value=10, step=1)]
app = distribution_plot_app(x_min=0,
                            x_max=40,
                            scipy_dist=st.hypergeom,
                            params=params,
                            transform=lambda N, a, b: (a+b, a, N),
                            x_axis_label='n',
                            title='Hypergeometric')
bokeh.io.show(app, notebook_url=notebook_url)