Completion requirements
Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.
Discrete distributions
Hypergeometric distribution
Story. Consider an urn with
white balls and
black balls. Draw
balls from this urn without replacement. The number white balls drawn,
, is Hypergeometrically distributed.
- Example. There are
finches on an island, and
of them are tagged (and therefore
of them are untagged). You capture
finches. The number of tagged finches
is Hypergeometrically distributed,
, as defined below.
- Parameters. There are three parameters: the number of draws
, the number of white balls
, and the number of black balls
.
- Support. The Hypergeometric distribution is supported on the set of integers between
and
, inclusive.
- Probability mass function.
- Usage
Package Syntax NumPy np.random.hypergeometric(a, b, N)
SciPy scipy.stats.hypergeom(a+b, a, N)
Stan hypergeometric(N, a, b)
- Related distributions.
- Notes.
- This distribution is analogous to the Binomial distribution, except that the Binomial distribution describes draws from an urn with replacement. In the analogy,
.
- SciPy uses a different parametrization. Let
be the total number of balls in the urn. Then, noting the order of the parameters, since this is what scipy.stats.hypergeom expects,
- The random number generator in
numpy.random
has a different parametrization than in thescipy.stats
module. Thenumpy.random.hypergeom()
function uses the same parametrization as Stan, except the parameters are given in the order a, b, N, not N, a, b, as in Stan. - When using the sliders below, you will only get a plot if
because you cannot draw more balls out of the urn than are actually in there.
params = [dict(name='N', start=1, end=20, value=10, step=1),
dict(name='a', start=1, end=20, value=10, step=1),
dict(name='b', start=1, end=20, value=10, step=1)]
app = distribution_plot_app(x_min=0,
x_max=40,
scipy_dist=st.hypergeom,
params=params,
transform=lambda N, a, b: (a+b, a, N),
x_axis_label='n',
title='Hypergeometric')
bokeh.io.show(app, notebook_url=notebook_url)