Introduction to Decision Trees

A Decision Tree is a flowchart-like structure used for classification and regression. Each internal node represents a test on a feature, each branch represents the outcome of the test, and each leaf node represents a final decision or classification.

🏗️ Building a Decision Tree

Start with the full dataset.
At each node, choose the best feature to split on.
Repeat the process recursively for each child node until:
- All data is classified perfectly, or
- A stopping criterion is met (e.g., max depth, minimum samples, etc.).

🎯 Getting to the "Best" Decision Tree

Goal: Find the tree that best classifies the data.
Challenge: Finding the optimal decision tree is computationally infeasible.
- It’s an NP-hard problem.
- The number of possible trees grows exponentially with the number of features and data points.
- At each node, we’d need to try all possible features and all possible ways to split them.

⚙️ A Practical Approach: Greedy Algorithms

Since finding the best tree is impractical, we use a greedy heuristic:

At each step:
- Evaluate all available features.
- Choose the one that gives the best immediate improvement in performance (e.g., accuracy, information gain, Gini impurity).
- Split on that feature and repeat recursively.

❓How Do We Choose the Best Feature?

For each candidate feature:
- Calculate the accuracy (or other metric) resulting from splitting the data on that feature.
- Choose the feature that results in the highest accuracy (or other measure).
Common criteria:
- Accuracy (simple but not always robust)
- Information Gain (used in ID3)
- Gini Impurity (used in CART)
- Gain Ratio, Chi-square, etc.