I’m reading the new statistics textbook ‘Statistical Rethinking’ by Richard McElreath. Chapter 2 is heavily about the topic of Bayesian updating. That is, producing a posterior distribution from a prior distribution and a new piece of data.

I’ll paraphrase one of the exercises here:

Suppose there are two species of panda bear. Both are equally common in the wild and live in the same places. They are exactly alike but differ in family size. Species A gives birth to twins 10% of the time, and one infant the remainder of the time. Species B has twins 20% of the time. Suppose a female panda of unknown species gives birth to twins. What is the probability her next birth will also be twins.

-Paraphrased from ‘Statistical Rethinking’ by Richard McElreath

Without explicit knowledge of Bayesian updating, I don’t immediately know how to proceed with this question. I have some vague notion that the probability should be around 15%, or possibly higher because the first birth was twins. But an exact numerical answer is out of reach.

Having read the chapter, such an exercise is straightforward. It’s a pure example of Bayesian updating.

The first step is to declare a **data story**, or method of framing the problem mathematically. We will assume that either panda variety is equally likely without further information. This is called the **principle of indifference**, in which possible outcomes are considered equally likely in the absence of data. This is our original **prior distribution**. We will also assume that our knowledge about panda variants is correct, without uncertainty, which is probably not a safe assumption in general, but is made explicit here as an example.

The first birth of twins is a data point, Twins1, that we can use to update our model. McElreath phrases Bayes Theorem as follows:

Posterior = Likelihood * Prior / Average Likelihood

Posterior is the distribution after including new data, Likelihood is the likelihood of observing the data under any of the real world states under consideration, Prior is the prior distribution, and Average Likelihood is the expected value of the data *under the prior distribution*.

We will first phrase Bayes as follows, for our case.

Pr(*p*|Twins1) = Pr(Twins1|*p*) * Pr(*p*) / Pr(Twins1)

Here, *p* is a slightly abstracted variable which includes likelihoods of both species A and species B. Twins1 is the datum of obeserving the first birth to be twins.

Pr(*p*|Twins1) is the posterior distribution, in other words, the distribution *p* given that we observe Twins1. This is what we want to calculate.

Pr(Twins1|*p*) is the likelihood of observing Twins1 under any of the known conditions, in other words, 10% likelihood of twins for species A, or 20% likelihood for species B.

Pr(*p*) is the prior distribution, which we decided based on the principle of indifference to be 50% species A, 50% species B.

Pr(Twins1) is the overal probability of observing Twins1, *under the prior distribution*. In other words:

Pr(Twins1) = Pr (Twins1|*p*) * Pr(*p*)

We can break this down into explicit terms for the different scenarios represented by p. SpA and SpB represent the probabilities of species A & B respectively.

Pr(Twins1) = Pr(Twins1|SpA) * Pr(SpA) + Pr (Twins1|SpB) * Pr(SpB)

Under our current prior.

Pr(Twins1) = 0.1 * 0.5 + 0.2 * 0.5 = 0.15

If we break down *p *for the overall equation, we get two separate ones:

Pr(SpA|Twins1) = Pr(Twins1|SpA) * Pr(SpA) / Pr(Twins1)

Pr(SpB|Twins1) = Pr(Twins1|SpB) * Pr(SpB) / Pr(Twins1)

Our posterior distribution is thus:

Pr(SpA|Twins1) = 0.1 * 0.5 / 0.15 = 0.33

Pr(SpA|Twins1) = 0.2 * 0.5 / 0.15 = 0.67

Thus, the posterior distribution looks like this:

The probability of observing another set of twins, Twins2, is thus:

Pr(Twins2) = Pr(Twins2|SpA) * Pr(SpA) + Pr(Twins2|SpB) * Pr(SpB)

Pr(Twins2) = 0.1 * 0.33 + 0.2 * 0.67 = 0.167 = 16.7%

This is a relatively simple example. In other cases, we are not dealing with discrete categories (SpA & SpB) but a continuous parameter. For example, we may like to determine the actual likelihood of twin birth, call it *q*, without having known categories. I’ll deal with this case in an upcoming post.

For a more complete exploration of Bayes, see Arbital’s treatment, the LessWrong treatment or Wikipedia. Or even better, get ‘Statistical Rethinking’.