Completion requirements
Given any module that deals with statistics, one basic skill you must have is to be able to program and create plots of probability distributions typically encountered in the field of data science. This tutorial should remind you of various distributions introduced in this section, but now they are phrased using the scipy.stats module.
Discrete multivariate distributions
So far, we have looked a univariate distributions, but we will consider multivariate distributions in class, and you will encounter them in your research. First, we consider a discrete multivariate distribution, the Multinomial.
Multinomial distribution
- Story. This is a generalization of the Binomial distribution. Instead of a Bernoulli trial consisting of two outcomes, each trial has
outcomes. The probability of getting
of outcome 1,
of outcome 2, ..., and
of outcome
out of a total of
trials is Multinomially distributed.
- Example. There are two alleles in a population, A and a. Each individual may have genotype AA, Aa, or aa. The probability distribution describing having
AA individuals,
Aa individuals, and
aa individuals in a population of
total individuals is Multinomially distributed.
- Parameters.
, the total number of trials, and
, the probabilities of each outcome. Note that
and there is a further restriction that
.
- Support. The K-nomial distribution is supported on
.
- Usage The usage below assumes that theta is a length K array.
Package Syntax NumPy np.random.multinomial(N, theta)
SciPy scipy.stats.multinomial(N, theta)
Stan sampling multinomial(theta)
Stan rng multinomial_rng(theta, N)
- Probability density function.
- Related distributions.
- The Multinomial distribution generalizes the Binomial distribution to multiple dimensions.
- Notes.
- For a sampling statement in Stan, the value of N is implied