🌳 Decision Trees: Quick Overview
- A Decision Tree splits data recursively based on feature values to predict an output.
- It’s easy to interpret, but...
❌ Issues with Decision Trees
⚠️ Overfitting
- Trees tend to memorize training data, especially deep ones.
- They perform well on training data but poorly on unseen data.
✅ Techniques to Reduce Overfitting
- Max Depth
- Limit how deep the tree can grow.
- Shallow trees generalize better.
- Min Samples Split / Leaf
- Don’t split unless a node has a minimum number of samples.
- Prevents creating small, specific branches.
- Cost Complexity Pruning (α-pruning)
- Cut back branches that don’t improve model significantly.
🌲 Random Forests: Extension to Decision Trees
- Random Forest = Many Decision Trees
- Each tree is trained on:
- A random subset of the training data (bootstrapping)
- A random subset of features at each split
🔄 How Random Forests Help
- Diversity in Trees
- Not all trees are the same due to random sampling.
- Reduces correlation between trees.