Introduction | Notion

🌳 Decision Trees: Quick Overview

A Decision Tree splits data recursively based on feature values to predict an output.
It’s easy to interpret, but...

❌ Issues with Decision Trees

⚠️ Overfitting

Trees tend to memorize training data, especially deep ones.
They perform well on training data but poorly on unseen data.

✅ Techniques to Reduce Overfitting

Max Depth
- Limit how deep the tree can grow.
- Shallow trees generalize better.
Min Samples Split / Leaf
- Don’t split unless a node has a minimum number of samples.
- Prevents creating small, specific branches.
Cost Complexity Pruning (α-pruning)
- Cut back branches that don’t improve model significantly.

🌲 Random Forests: Extension to Decision Trees

Random Forest = Many Decision Trees
Each tree is trained on:
- A random subset of the training data (bootstrapping)
- A random subset of features at each split

🔄 How Random Forests Help

Diversity in Trees
- Not all trees are the same due to random sampling.
- Reduces correlation between trees.