Bayes’ Theorem

Bayes anything can be considered buzz words in practically any environment. I want to go on a journey to demystifying some of this buzziness and land on some real understanding.


Before moving on, if you have not looked at my posts on Simple Probability and Conditional Probability, please do so now


A Tribute to Bayes

It’s a buzz word, buzz name.

It’s important and Bayes was probably the first person to present it like he did.

However, there is nothing in my mind that warrants calling it Bayes. This is not to mean that I don’t think we should recognize his contribution to probability theory. I also don’t think that people would ever stop using Bayes to describe what I am about to explain. But calling things Bayes’ Theorem, Bayes’ Formula, Baye’s Law, Baye’s Rule does not give any extra understanding to it all. At least they don’t to me.

I tip my hat to Bayes. I’m going to explain something that I probably understand less than Bayes himself.


Bayes without Bayes

Let’s look at the Fair/Unfair coin example from the Conditional post. Again, we have a bag that has a Fair and Unfair coin in it. We reach in and randomly flip the coin and note the result.

\(\text{F: fiar coin}\)
\(\text{F’: unfiar coin}\)

\(P(F)=0.5\)
\(P(F’)=0.5\)

\(P(H|F)=0.5\)
\(P(T|F)=0.5\)

\(P(H|F’)=0.80\)
\(P(T|F’)=0.20\)

And the Ven diagram

Now consider the following. You are an outside observer, and you can’t tell when the coin is flipped weather it was the fair or unfair coin. You observe a single flip of heads, and you wonder what the probability that the fair coin was selected?

So you want to know the following

\(P(F|H)=?\)

Applying formulas we already know for conditionals.

\(P(A|B)=\frac{P(A \cap B)}{P(B)}\)

\(P(F|H)=\frac{P(F \cap H)}{P(H)}\)

Based on the tree diagram, we can fill in the numerator.

\(P(F|H)=\frac{0.25}{P(H)}\)

Heads can occur in two ways. It can happen when the fair coin is selected or when the unfair coin is selected.

\(P(H)=P(H \cap F)+P(H \cap F’)=0.25+0.4=0.65\)

So

\(P(F|H)=\frac{0.25}{0.65}=\frac{5}{13} \approx 0.3846\)

Just with the information we learned about conditionals, we have applied what is called Bayes’ Formula.


Bayes with Bayes’ Formula

Let’s go over what we started with in the previous example.

\(P(F)\)
\(P(F’)\)
\(P(H|F)\)
\(P(T|F)\)
\(P(H|F’)\)
\(P(T|F’)\)

And what we wanted was

\(P(F|H)=?\)

The goal will be to express what we want in terms of what we started with and nothing more.

The conditional formula

\(P(F|H)=\frac{P(F \cap H)}{P(H)}\)

The denominator can be split up as follows

\(P(F|H)=\frac{P(F \cap H)}{P(F \cap H)+P(F’ \cap H)}\)

What we started with had no intersections, but recall that we have a way to express intersections in term of conditionals.

\(P(A \cap B)=P(A)P(B|A)\)
\(P(A \cap B)=P(B)P(A|B)\)

Two intersections to rewrite.

\(P(F \cap H)=P(F)P(H|F)\)
\(P(F’ \cap H)=P(F’)P(H|F’)\)

Doing these substitutions

\(P(F|H)=\frac{P(F)P(H|F)}{P(F)P(H|F)+P(F’)P(H|F’)}\)

Bam! Bayes’ Formula

Let’s take a look at the tree diagram again.

One way of thinking about Bayes is when looking at a two-stage experiment, we know what happened in the second part of the experiment, but have a question about the first stage.

In our example, we knew that the coin flip was heads, second stage, but we wanted to know the likelihood that the fair coin was selected, first stage.

This type of question happens quite often. I like to call them “Bayes type questions”

A classic example of these “Bayes type questions” is the test for a particular condition vs whether the condition is actually present.

While inconvenient, it is posible that a test says a someone has a horrible disease, but they don’t actually have this disease. The other way around, it is possible that a test says a person does not have this horrible disease when they actually do have the disease.

We often know the likelihood of a condition.

\(\text{C: a particular condition is present}\)

\(P(C)\)
\(P(C’)\)

And when evaluating a new test for this condition, one would perform this test where the condition was present and when the condition was not present.

\(\text{T: test says the condition is present}\)

\(P(T|C)\)
\(P(T’|C)\)
\(P(T|C’)\)
\(P(T’|C’)\)

For this situation the “Bayes type questions” would be in the form:

Given the result of the test saying the condition is present or not, what is the likelihood of the condition actually being present or not.

I will leave it to the reader to try and apply Bayes’ Formula for the following. I will note that I find setting up a tree diagram very helpful in setting up Bayes’ Formula.

\(P(C|T)\)
\(P(C’|T)\)
\(P(C|T’)\)
\(P(C’|T’)\)


Generalized Bayes’ Formula

Consider a sample space that is partitioned into some arbitrary number of different sets, \(A_1,A_2, \cdots ,A_{n-1},A_n\). Examples of this partition could be countries of origin, school districts, species, what have you. And now consider an overarching set that spans this partition, \(B\). This overarching set could represent liking apple pie, getting into college, death from an oil spill.

A question might be, given that \(B\) has occurred what is the likelihood that a particular \(A_j\) from our partition was involved.

Given a person likes apple pie, what is the likelihood they came from France?

Given we know someone made it into college, what is the likelihood they were from school district Beta?

Given we know a specimen died from the Deepwater Horizon oil spill, what was the likelihood it was a dolphin?

And the generalized formula

\(P(A_j|B)=\frac{P(A_j)P(B|A_j)}{P(A_1)P(B|A_1)+P(A_2)P(B|A_2)+ \cdots +P(A_{n-1})P(B|A_{n-1})+P(A_n)P(B|A_n)}\)


Final Thoughts

This post just went into the basics of the topic of Bayes. I sometimes don’t think about the topic of Bayes in terms of Bayes but in terms of conditionals.

I on purpose did not go into any of the terminologies that are commonly used around Bayes. I sometimes find them arbitrary and not very useful. Actually, I feel like the reason that some find Bayes confusing is because other people have assigned vage words to things that should be described with math rather than words.