Handling Continuous-Valued Features in Decision Trees

Instead of splitting on values directly (like categories), we find a threshold to split the data into two parts:

For a continuous feature (e.g., "age", "temperature"):

Sort the data based on that feature.
Identify potential split points:
- These are typically midpoints between adjacent values where the class label changes.
- Example: If sorted values are 5, 7, 10, 12 and labels change between 7 and 10, try threshold = (7+10)/2 = 8.5
For each possible threshold:
- Split the data into two groups (<= threshold, > threshold)
- Compute the split criterion (e.g., information gain, Gini)
Choose the threshold that gives the best result.

For a feature "age" and dataset:

Potential thresholds:

(22+25)/2 = 23.5,

(25+28)/2 = 26.5,

(28+30)/2 = 29,

(30+35)/2 = 32.5