Basic Math Concepts – Xavier Initialization Applicability in Neural Network

Concept Explanation
Uniform Distribution Values are chosen randomly between [-limit, +limit]
Variance Matching Keeps output variance equal to input variance
ReLU Activation Helps prevent vanishing gradient
Square Root Scaling Derived from variance propagation through layers

Xavier Formula Again:

Screenshot

This is based on preserving variance:

  • Variance of inputs ≈ Variance of outputs
  • Helps stable gradient descent

Next – Sparse Initialization Applicability in Neural Network