Basic Math Concepts – Regularization in Neural Network

1. Mean Squared Error (MSE):

Used to measure prediction error:

Screenshot

2. L2 Regularization (Ridge):

Penalizes large weights by adding a term to loss:

Screenshot

Where:

  • λ: Regularization strength
  • wi: Individual weights

3. Gradient Descent:

Updates weights using:

Screenshot

With regularization:

Screenshot

Summary

Concept Without Regularization With Regularization
Overfitting High chance Reduced
Weights Can grow large Kept small
Generalization Poor Better
Loss Function MSE only MSE + weight penalty

Here’s a visual chart showing how loss behaves over epochs with and without regularization:

null

Explanation of the Chart

Feature Without Regularization With Regularization
Loss Drop Very sharp initially Steady and smoother
Final Loss Lower (but misleading) Slightly higher (but stable)
Risk May overfit – model memorizes Lower overfitting – better generalization
Weight Growth May become large Penalized, remains controlled

Key Insight

  • Without regularization: The model quickly minimizes the training error, but might do so by over-relying on specific features (weights explode).
  • With regularization: The model trades off a small increase in error for better simplicity and generalization.

Next – L1 Regularization in Neural Network