Problems of linear regression for classification | Notion

Linear regression is designed for predicting continuous values, not categorical outcomes. Here's a breakdown of the issues:

⚠️ 1. Inappropriate Output Range

Linear regression outputs any real number, from −∞\infty to +∞+\infty.
For binary classification, we need outputs between 0 and 1, to represent probabilities.

To compensate, a threshold (e.g., 0.5) is chosen:

Output ≥ 0.5 → class 1
Output < 0.5 → class 0

But this introduces problems:

📉 2. Poor Decision Boundary with More Data

With limited data, a best-fit line might seem to work.
As more data is added (e.g., outliers or new samples further along the x-axis), the line shifts.
- This shift moves the threshold.
- Correct classifications may now become incorrect.

➡️ Model performance degrades with data expansion.

🔁 3. Linear Assumptions Don't Match Classification Needs

Linear regression tries to minimize squared error.
But classification aims to minimize classification error (i.e., misclassifications).
These are not the same objective, so linear regression is not optimized for classification accuracy.