Probability Estimation of Discrete Valued Features

Probability Estimation for Discrete-Valued Features in Naive Bayes

When features are discrete/categorical, such as color = red/green/blue, Naive Bayes estimates probabilities from observed frequencies in the training data.

1. Basic Probability Estimation

Given:

Feature $x_i$ with possible values $v_1, v_2, ..., v_k$
Class label $y \in \{c_1, c_2, ..., c_m\}$

We estimate:

$$ P(x_i = v_j \mid y = c_k) = \frac{\text{count}(x_i = v_j, y = c_k)}{\text{count}(y = c_k)} $$

2. Problem: Zero Probabilities

If a combination never occurs in training, the probability becomes zero, which makes the product of probabilities zero:

$$ P(y \mid x_1, ..., x_n) \propto 0 $$

3. Solution: Laplace Smoothing

To avoid zero probabilities:

$$ P(x_i = v_j \mid y = c_k) = \frac{\text{count}(x_i = v_j, y = c_k) + 1}{\text{count}(y = c_k) + k} $$

Where:

$k$ = number of possible values for feature $x_i$
Adds a pseudo-count of 1 to all possible values