Information Gain, Gain Ratio, Gini Index

📊 What is Information Gain?

Information Gain (IG) is a metric used to select the best feature to split the data in a decision tree. It measures how much uncertainty (entropy) is reduced after the split.

🔣 Formula:

Information Gain=Entropy (before split)−Weighted Entropy (after split)\text{Information Gain} = \text{Entropy (before split)} - \text{Weighted Entropy (after split)}

Where:

Entropy measures the impurity or disorder of a dataset.
Lower entropy = more "pure" (e.g., all samples have same label).

🧮 Entropy Formula:

Entropy(S)=−∑i=1cpilog⁡2pi\text{Entropy}(S) = -\sum_{i=1}^{c} p_i \log_2 p_i

Where pip_i is the proportion of samples in class ii.

⚠️ Limitations of Information Gain

Problem: Information Gain prefers features with many distinct values, like:

Student ID
Name
Timestamp

Such features create many small, pure partitions, which looks good because it reduces entropy, but doesn’t generalize well.

Example:

If you split on a feature like "Customer ID", each value could go to its own branch — entropy becomes 0, but the tree becomes overfitted and useless.