Basic Math Concepts – Regularization in Neural Network
1. Mean Squared Error (MSE):
Used to measure prediction error:
2. L2 Regularization (Ridge):
Penalizes large weights by adding a term to loss:
Where:
- λ: Regularization strength
- wi: Individual weights
3. Gradient Descent:
Updates weights using:
With regularization:
Summary
| Concept | Without Regularization | With Regularization |
|---|---|---|
| Overfitting | High chance | Reduced |
| Weights | Can grow large | Kept small |
| Generalization | Poor | Better |
| Loss Function | MSE only | MSE + weight penalty |
Here’s a visual chart showing how loss behaves over epochs with and without regularization:

Explanation of the Chart
| Feature | Without Regularization | With Regularization |
|---|---|---|
| Loss Drop | Very sharp initially | Steady and smoother |
| Final Loss | Lower (but misleading) | Slightly higher (but stable) |
| Risk | May overfit – model memorizes | Lower overfitting – better generalization |
| Weight Growth | May become large | Penalized, remains controlled |
Key Insight
- Without regularization: The model quickly minimizes the training error, but might do so by over-relying on specific features (weights explode).
- With regularization: The model trades off a small increase in error for better simplicity and generalization.




