🧪 Bagging (Bootstrap Aggregating)
Bagging is a key ensemble learning technique used in Random Forests.
🧱 Steps in Bagging:
- Select k rows with replacement
- From
x total data points, sample k rows randomly with replacement.
- Some data points may appear multiple times, others may be left out.
- This introduces data diversity.
- Train a decision tree on this bootstrap sample.
🌲 Random Forest = Bagging + Feature Sampling
Random Forests enhance bagging by adding a second layer of randomness:
🔁 Two Randomizations in Random Forests:
1. Row Sampling (Bagging)
- Each tree gets a random subset of rows with replacement.
- Helps reduce variance and avoids overfitting to specific data points.
2. Feature Sampling
- At each split, the tree randomly selects k features out of n (without replacement) to consider.
- Usually:
- For classification:
k = √n
- For regression:
k = n/3
This ensures trees make decisions using different subsets of features, leading to less correlation between trees.
⚡ Result: More Diverse Trees
Because of:
- Different subsets of data points