Basic Math Concepts – L2 Regularization in Neural Network

L2 regularization adds a penalty term to the loss function:

Loss Without Regularization:

Loss With L2 Regularization:

Where:

  • λ is the regularization strength
  • wj are the model’s weights

Why It Helps?

  • Minimizing this loss not only makes predictions accurate but also keeps weights small.
  • Small weights mean simpler models, which reduce overfitting.

Summary

Aspect Without L2 With L2
Loss Just prediction error Error + weight penalty
Weights Can become large Encouraged to stay small
Generalization May overfit Better on unseen data

Here is the chart showing how the weight values evolve over epochs with and without L2 regularization:

  • Without L2: The weights can grow larger as the model tries to minimize the error aggressively.
  • With L2: The weight values are more controlled and converge smoothly, thanks to the regularization penalty.

Next – L1 Regularization vs L2 Regularization Selection for different use cases in Neural Network