- Story. A probability is assigned to each of a set of discrete outcomes.
- Example. A hen will peck at grain A with probability
, grain B with probability
, and grain C with probability
.
- Parameters. The distribution is parametrized by the probabilities assigned to each event. We define
to be the probability assigned to outcome
. The set of
's are the parameters, and are constrained by

.
- Support. If we index the categories with sequential integers from 1 to N, the distribution is supported for integers 1 to
N
, inclusive.
- Probability mass function.

.
- Usage (with theta length
)
Package |
Syntax |
NumPy |
np.random.choice(len(theta), p=theta) |
SciPy |
scipy.stats.rv_discrete(values=(range(len(theta)), theta)).rvs() |
Stan |
categorical(theta) |
- Related distributions.
- The Discrete Uniform distribution is a special case where all
are equal.
- The Bernoulli distribution is a special case where there are two categories that can be encoded as having outcomes of zero or one. In this case, the parameter for the Bernoulli distribution is
.
- Notes.
- This distribution must be manually constructed if you are using the scipy.stats module using
scipy.stats.rv_discrete()
. The categories need to be encoded by an index. For interactive plotting purposes, below, we need to specify a custom PMF and CDF.
- To sample out of a Categorical distribution, use
numpy.random.choice()
, specifying the values of
using the p kwarg.
def categorical_pmf(x, θ1, θ2, θ3):
thetas = np.array([θ1, θ2, θ3, 1-θ1-θ2-θ3])
if (thetas < 0).any():
return np.array([np.nan]*len(x))
return thetas[x-1]
def categorical_cdf_indiv(x, thetas):
if x < 1:
return 0
elif x >= 4:
return 1
else:
return np.sum(thetas[:int(x)])
def categorical_cdf(x, θ1, θ2, θ3):
thetas = np.array([θ1, θ2, θ3, 1-θ1-θ2-θ3])
if (thetas < 0).any():
return np.array([np.nan]*len(x))
return np.array([categorical_cdf_indiv(x_val, thetas) for x_val in x])
params = [dict(name='θ1', start=0, end=1, value=0.2, step=0.01),
dict(name='θ2', start=0, end=1, value=0.3, step=0.01),
dict(name='θ3', start=0, end=1, value=0.1, step=0.01)]
app = distribution_plot_app(x_min=1,
x_max=4,
custom_pmf=categorical_pmf,
custom_cdf=categorical_cdf,
params=params,
x_axis_label='category',
title='Discrete categorical')
bokeh.io.show(app, notebook_url=notebook_url)