0

I encountered the term multinoulli distribution in the following sentence from Chapter 4: Numerical Computation of the deep learning book.

The softmax function is often used to predict the probabilities associated with a multinoulli distribution.

I am guessing that multinouli distribution is any probability distribution that has been defined on a random variable taking multiple values. I know that SoftMax function is used in converting a vector into another vector of the same length with probability values that signify the chance of input falling into that particular class.

Suppose $C$ is a random variable with support $\{c_1, c_2, c_3, \cdots, c_k\} $. Then I am guesssing that any probability distribution on $C$ is a multinouli distribution. SoftMax is an example of such multinouli distribution that uses the expression $\dfrac{e^x}{\sum e^x}$ for calculating probabilities.

Is my guess correct about multinoulli distribution? The reason for my doubt is that I never came across the word multinoulli and I cannot find even on internet. If my guess is wrong, where can I read about multinoulli distribution?

nbro
  • 39,006
  • 12
  • 98
  • 176
hanugm
  • 3,571
  • 3
  • 18
  • 50

2 Answers2

1

You can find a description of this distribution (which is also known as categorical distribution, which you probably already heard of) in section 2.3.2 (p. 35) of the book Machine Learning: A Probabilistic Perspective (by K. Murphy). You can also find there and in the previous section a description of the related Bernoulli, binomial and multinomial (the most general of the four) distributions. The word multinoulli is used in order to remind you that this distribution is a generalization of the Bernoulli.

In any case, this is how you may remember these four probability distributions.

  • Bernoulli: you throw a coin only once ($n=1$), and a coin has $k = 2$ outcomes (heads or tails)
  • Binomial: you throw a coin $n$ times, where $n$ can be greater than $1$, and a coin has $k = 2$ outcomes
  • Categorical: you throw a dice, with $k$ sides (e.g. a side may be $1$ or $5$), where $k$ can be greater than $2$, only once ($n=1$)
  • Multinomial: you throw a dice, with $k$ sides, $n$ times

So, these are all discrete probability distributions, with an associated probability mass function (p.m.f). In all of them, we have two parameters $n$ (the number of trials of the experiment) and $k$ (the number of outcomes).

I don't think it is correct to say that a softmax is a probability distribution. The softmax is a function used to compute the probabilities that you associate with a categorical distribution, i.e. you use the softmax to produce a probability vector (although some people will say that the softmax produces a probability distribution), as the excerpt that you quote states. In principle, I think you could use other functions to do that (an alternative to the regular softmax, sigsoftmax, is proposed in this paper, section 3.3, p. 6). So, the softmax is used to model a categorical distribution, but I wouldn't say it's a categorical distribution. You can find an explanation of why the softmax is used instead of e.g. just normalizing by the sum here (and a long list of probability distributions here).

nbro
  • 39,006
  • 12
  • 98
  • 176
0

You can find a definition on Wikipedia Categorical distribution.

In short, it is a generalization of the Bernoulli distribution to multiple variables (multivariate Bernoulli distribution).

nbro
  • 39,006
  • 12
  • 98
  • 176
codecypher
  • 127
  • 4
  • 2
    I think most of this answer is correct, except for "multiple variables" and "multivariate Bernoulli distribution". The categorical distribution is a generalization of the Bernoulli, but for a categorical variable, i.e. a variable that can take 1 of $k$ possible values (where $k = 2$, in the case of the Bernoulli). – nbro Aug 19 '21 at 10:15