Overfitting Vs Underfitting Impact example with Simple Python
1. We’ll simulate this using a simple polynomial-like model.
import random # Step 1: Generate Data (y = 2x + 1 + noise) def generate_data(n=20): data = [] for _ in range(n): x = random.uniform(0, 10) noise = random.uniform(-1, 1) y = 2 * x + 1 + noise # True underlying function data.append((x, y)) return data train_data = generate_data(20) test_data = generate_data(10) # Step 2: Predict with simple linear model (Underfitting) def predict_linear(x, w, b): return w * x + b # Step 3: Predict with complex model (Overfitting mimic with high-degree poly) def predict_complex(x, weights): # Simulate high degree polynomial: y = w0 + w1*x + w2*x^2 + ... + w5*x^5 return sum([weights[i] * (x ** i) for i in range(len(weights))]) # Step 4: Evaluate Mean Squared Error def mse(data, predict_fn): error = 0 for x, y in data: y_pred = predict_fn(x) error += (y - y_pred) ** 2 return error / len(data) # Underfitting: Model too simple w_simple = 0.5 b_simple = 0.5 underfit_predict = lambda x: predict_linear(x, w_simple, b_simple) # Overfitting: Model too complex w_complex = [1, 0.5, -0.2, 0.1, -0.05, 0.02] # over-tuned weights overfit_predict = lambda x: predict_complex(x, w_complex) # Ground truth model (close to reality) true_predict = lambda x: 2 * x + 1 print("Training MSE:") print("Underfitting Model:", mse(train_data, underfit_predict)) print("Overfitting Model :", mse(train_data, overfit_predict)) print("True Model :", mse(train_data, true_predict)) print("\nTesting MSE:") print("Underfitting Model:", mse(test_data, underfit_predict)) print("Overfitting Model :", mse(test_data, overfit_predict)) print("True Model :", mse(test_data, true_predict))
Sample Output You May See:
Training MSE:
Underfitting Model: 59.55756432821365
Overfitting Model : 78237.63722256606
True Model : 0.2031184902109579Testing MSE:
Underfitting Model: 77.43513174946756
Overfitting Model : 60178.331390988395
True Model : 0.36663658392975257
Here, underfitting model fails on both, overfitting model wins on training but fails on test, and true model generalizes well
2. Statement:
A model that appears to be overfitted on a small dataset might perform just fine (i.e., not overfitted) on a much larger dataset of the same nature.
Why This Happens:
1. Overfitting is about memorizing noise
- On a small dataset, a complex model might memorize specific patterns or noise, leading to poor generalization.
- But on a larger dataset, the model has more diverse examples to learn from, making it harder to memorize and easier to learn general patterns.
2. Larger dataset gives better generalization
- A bigger dataset tends to have:
- Less chance of coincidental correlations.
- Better representation of the real-world variance.
- Smoother underlying function to learn.
3. Model capacity stays the same, but data gets richer
- If the model capacity (number of layers, neurons, etc.) is fixed, increasing the dataset reduces overfitting risk — because now, the model is not too large relative to the data.
Example Analogy:
Imagine training a student with:
- 5 sample questions → the student memorizes (overfits).
- 500 sample questions of the same pattern → the student starts learning the actual concept, not just the answers.
Important:
Just increasing data doesn’t guarantee solving overfitting — but it greatly helps, especially when:
- The data is clean and diverse.
- The model is not too excessively large even for the bigger data.
- Proper validation techniques (like cross-validation) are used.
Empirical Rule of Thumb:
- Complex models + Small Data → High risk of overfitting
- Complex models + Large Data → Better generalization
- Simple models + Large Data → Often safest and robust
Overfitting vs Underfitting Impact in Neural Network – Basic Math Concepts