Problem:

Decision trees tend to overfit the training data because they keep splitting until every data point is perfectly classified, even if the splits capture noise.


🛑 How to Avoid Overfitting

1. Early Stopping (Pre-pruning)

Stop the tree from growing too deep.


2. Post-pruning (Pruning After Tree is Grown)

Let the tree grow fully, then prune branches that don't contribute much.

✅ This balances model accuracy and simplicity, helping generalization.


💡 Summary

Technique Type Key Idea Trade-off
Max Depth Pre-pruning Limit how deep tree can grow May cut useful splits early
Min Samples Leaf Pre-pruning Require minimum data points in a leaf Prevents over-splitting on small subsets
Cost Complexity Post-pruning Prune nodes that don’t improve cost Balances accuracy with simplicity