Problem:
Decision trees tend to overfit the training data because they keep splitting until every data point is perfectly classified, even if the splits capture noise.
Stop the tree from growing too deep.
max_depth: Set a limit on how deep the tree can grow.
min_samples_leaf)Let the tree grow fully, then prune branches that don't contribute much.
Define a cost function:
Cost = Error + α × Complexity
For each non-leaf node:
✅ This balances model accuracy and simplicity, helping generalization.
| Technique | Type | Key Idea | Trade-off |
|---|---|---|---|
| Max Depth | Pre-pruning | Limit how deep tree can grow | May cut useful splits early |
| Min Samples Leaf | Pre-pruning | Require minimum data points in a leaf | Prevents over-splitting on small subsets |
| Cost Complexity | Post-pruning | Prune nodes that don’t improve cost | Balances accuracy with simplicity |