A Decision Tree is a flowchart-like structure used for classification and regression. Each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf node represents a final decision or classification.
🏗️ Building a Decision Tree
- Start with the full dataset.
- At each node, choose the best feature to split on.
- Repeat the process recursively for each child node until:
- All data is classified perfectly, or
- A stopping criterion is met (e.g., max depth, minimum samples, etc.).
🎯 Getting to the "Best" Decision Tree
- Goal: Find the tree that best classifies the data.
- Challenge: Finding the optimal decision tree is computationally infeasible.
- It’s an NP-hard problem.
- The number of possible trees grows exponentially with the number of features and data points.
- At each node, we’d need to try all possible features and all possible ways to split them.
⚙️ A Practical Approach: Greedy Algorithms
Since finding the best tree is impractical, we use a greedy heuristic:
- At each step:
- Evaluate all available features.
- Choose the one that gives the best immediate improvement in performance (e.g., accuracy, information gain, Gini impurity).
- Split on that feature and repeat recursively.
❓How Do We Choose the Best Feature?
- For each candidate feature:
- Calculate the accuracy (or other metric) resulting from splitting the data on that feature.
- Choose the feature that results in the highest accuracy (or other measure).
- Common criteria:
- Accuracy (simple but not always robust)
- Information Gain (used in ID3)
- Gini Impurity (used in CART)
- Gain Ratio, Chi-square, etc.