Basic Math Concepts – Sparse Initialization Applicability in Neural Network
To understand sparse initialization, we should know:
Concept | Explanation |
---|---|
Matrix | Neural weights are stored as matrices. |
Sparsity | A matrix with many zero entries. If 70% are zero, sparsity = 0.7 |
Dot Product | For forward pass, only non-zero weights contribute to the result. |
Random Sampling | Picking a small subset of entries to initialize non-zero. |
Mathematical Expression: If W is the weight matrix:
- Sparse init:
Where p is the non-zero proportion (e.g., 0.2 or 20%)
Why Not Use Dense Initialization Always?
Sparse Initialization | Dense Initialization |
---|---|
Light on memory | Heavy memory usage |
Faster at start | Slower on big data |
Better for sparse data | Can overfit on sparse data |
Less risk of exploding gradients | Higher risk if not careful |
Sparse Initialization Applicability in Neural Network – Visual Roadmap