🧠 Key Idea:
Instead of splitting on values directly (like categories), we find a threshold to split the data into two parts:
- Those with feature values less than or equal to the threshold
- Those with feature values greater than the threshold
🔍 How It Works:
For a continuous feature (e.g., "age", "temperature"):
- Sort the data based on that feature.
- Identify potential split points:
- These are typically midpoints between adjacent values where the class label changes.
- Example: If sorted values are 5, 7, 10, 12 and labels change between 7 and 10, try threshold = (7+10)/2 = 8.5
- For each possible threshold:
- Split the data into two groups (<= threshold, > threshold)
- Compute the split criterion (e.g., information gain, Gini)
- Choose the threshold that gives the best result.
📦 Example:
For a feature "age" and dataset:
| Age |
Class |
| 22 |
Yes |
| 25 |
No |
| 28 |
No |
| 30 |
Yes |
| 35 |
Yes |
Potential thresholds:
(22+25)/2 = 23.5,
(25+28)/2 = 26.5,
(28+30)/2 = 29,
(30+35)/2 = 32.5