Probability Estimation for Discrete-Valued Features in Naive Bayes

When features are discrete/categorical, such as color = red/green/blue, Naive Bayes estimates probabilities from observed frequencies in the training data.


1. Basic Probability Estimation

Given:

We estimate:

$$ P(x_i = v_j \mid y = c_k) = \frac{\text{count}(x_i = v_j, y = c_k)}{\text{count}(y = c_k)} $$


2. Problem: Zero Probabilities

If a combination never occurs in training, the probability becomes zero, which makes the product of probabilities zero:

$$ P(y \mid x_1, ..., x_n) \propto 0 $$


3. Solution: Laplace Smoothing

To avoid zero probabilities:

$$ P(x_i = v_j \mid y = c_k) = \frac{\text{count}(x_i = v_j, y = c_k) + 1}{\text{count}(y = c_k) + k} $$

Where: