Conceptual Motivation for Bayes' Theorem in Contrast to Another Formula

Metronome · May 9, 2023

According to Wikipedia...

With Bayesian probability interpretation, [Bayes'] theorem expresses how a degree of belief, expressed as a probability, should rationally change to account for the availability of related evidence.

Suppose I want to relate

\Pr(W\ |\ C)

, a degree of rational belief in

W

after evidence

C

becomes available, back to

\Pr(W)

, a degree of rational belief in

W

before evidence

C

becomes available.

Bayes' Theorem is just a double-application of the conditional probability formula, thus...

\boldsymbol{\Pr(W\ |\ C)}\ =\ \frac{\Pr(W\ \cap\ C)}{\Pr(C)} = \frac{\boldsymbol{\Pr(W)}\Pr(C\ |\ W)}{\Pr(C)}

Alternatively however, I could apply the conditional probability formula once, and then the inclusion-exclusion formula in the numerator to obtain...

\boldsymbol{\Pr(W\ |\ C)} = \frac{\boldsymbol{\Pr(W)}\ +\ \Pr(C)\ -\ \Pr(W\ \cup\ C)}{\Pr(C)}

This formula also relates

\Pr(W\ |\ C)

back to

\Pr(W)

. My question is, what warrants embracing the top relation (Bayes' Theorem) and rejecting the bottom relation as the correct mathematical representation of belief updating?

My first attempt at an answer was that the bottom relation is unhelpful because

\Pr(W\ \cup\ C)

is not algebraically independent of

\Pr(W)

, and thus the appearance of

\Pr(W)

may be "artificial," analogous to a more blatantly artificial introduction of

\Pr(W)

such as

\boldsymbol{\Pr(W\ |\ C)}\ =\ \frac{\Pr(W\ \cap\ C)}{\Pr(C)}\ +\ \boldsymbol{\Pr(W)}\ -\ \boldsymbol{\Pr(W)}

. However, the

\Pr(C\ |\ W)

that appears in the top relation is also not algebraically independent of

\Pr(W)

, the conditional probability formula already applied being the bridge between the two.

Another thought I had is that the top and bottom relations may actually be comparably good mathematical representations of belief updating, and the top relation is merely preferred because it acts on

\Pr(W)

via multiplication, whereas the bottom relation acts on

\Pr(W)

via addition and multiplication. However, this feels quite weak to explain the ubiquity of the top relation, as it is, in general, far from true that the most useful mathematical expression is always the simplest.

Why should I herald Bayes' Theorem as the unique intuition for belief updating, and not the other expression?

Conceptual Motivation for Bayes' Theorem in Contrast to Another Formula

Metronome

Junior Member