Discrete multivariate distributions

So far, we have looked a univariate distributions, but we will consider multivariate distributions in class, and you will encounter them in your research. First, we consider a discrete multivariate distribution, the Multinomial.


Multinomial distribution

  • Story. This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has \(K\) outcomes. The probability of getting \(n_1\) of outcome 1, \(n_2\) of outcome 2, ..., and \(y_K\) of outcome \(K\) out of a total of \(N\) trials is Multinomially distributed.
  • Example. There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having \(y_1\) AA individuals, \(y_2\) Aa individuals, and \(n_3\) aa individuals in a population of \(N\) total individuals is Multinomially distributed.
  • Parameters. \(N\), the total number of trials, and \(θ={θ_1,θ_2,…θ_k}\), the probabilities of each outcome. Note that \(∑_i θ_i=1\) and there is a further restriction that \(∑_i y_i=N\).
  • Support. The K-nomial distribution is supported on \(\mathbb{N}^K\).
  • Usage The usage below assumes that theta is a length K array.

    Package Syntax
    NumPy np.random.multinomial(N, theta)
    SciPy scipy.stats.multinomial(N, theta)
    Stan sampling multinomial(theta)
    Stan rng multinomial_rng(theta, N)

  • Probability density function.

    \(\begin{align}
    f(\mathbf{y};\mathbf{\theta}, N) = \frac{N!}{y_1!\,y_2!\cdots y_k!}\,\theta_1^{y_1}\,\theta_2^{y_2}\cdots \theta_K^{y_K}
    \end{align}\)


  • Related distributions.
    • The Multinomial distribution generalizes the Binomial distribution to multiple dimensions.
  • Notes.
    • For a sampling statement in Stan, the value of N is implied