Probability Distributions and their Stories

Discrete multivariate distributions

So far, we have looked a univariate distributions, but we will consider multivariate distributions in class, and you will encounter them in your research. First, we consider a discrete multivariate distribution, the Multinomial.


Multinomial distribution

  • Story. This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has K outcomes. The probability of getting n_1 of outcome 1, n_2 of outcome 2, ..., and y_K of outcome K out of a total of N trials is Multinomially distributed.
  • Example. There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having y_1 AA individuals, y_2 Aa individuals, and n_3 aa individuals in a population of N total individuals is Multinomially distributed.
  • Parameters. N, the total number of trials, and θ={θ_1,θ_2,…θ_k}, the probabilities of each outcome. Note that ∑_i θ_i=1 and there is a further restriction that ∑_i y_i=N.
  • Support. The K-nomial distribution is supported on \mathbb{N}^K.
  • Usage The usage below assumes that theta is a length K array.

    Package Syntax
    NumPy np.random.multinomial(N, theta)
    SciPy scipy.stats.multinomial(N, theta)
    Stan sampling multinomial(theta)
    Stan rng multinomial_rng(theta, N)

  • Probability density function.

    \begin{align}f(\mathbf{y};\mathbf{\theta}, N) = \frac{N!}{y_1!\,y_2!\cdots y_k!}\,\theta_1^{y_1}\,\theta_2^{y_2}\cdots \theta_K^{y_K}\end{align}

  • Related distributions.
    • The Multinomial distribution generalizes the Binomial distribution to multiple dimensions.
  • Notes.
    • For a sampling statement in Stan, the value of N is implied