Bayes' Theorem – Introduction

Bayes' Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Bayes' Theorem Formula:

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

Where:


Naive Bayes Assumption

"Naive" refers to the assumption of feature independence, meaning:

P(x_1, x_2, ..., x_n | y) = P(x_1|y) \cdot P(x_2|y) \cdot ... \cdot P(x_n|y)

This simplifies the computation because instead of computing a potentially high-dimensional joint probability, we just compute the product of individual probabilities.

Impact on Equation:

The full Bayes' Theorem with features looks like:

P(y|x_1, x_2, ..., x_n) = \frac{P(y) \cdot P(x_1, x_2, ..., x_n | y)}{P(x_1, x_2, ..., x_n)}

Under Naive Bayes, this becomes:

P(y|x_1, ..., x_n) \propto P(y) \cdot \prod_{i=1}^n P(x_i | y)

The denominator $P(x_1, ..., x_n)$ can be ignored for classification since it’s the same for all classes.