Basic Math Concepts – Minimize objective in Neural Network
1. The Model Function
We assume a simple linear model:
y^ = w ⋅ x
Where:
- x is the input,
- w is the weight (what we want to learn),
- y^ is the predicted output.
2. Loss Function: Mean Squared Error (MSE)
To know how “wrong” our prediction is, we use:
Loss = (y^ – y)^2 = (w ⋅ x – y)^2
This gives a non-negative number representing error — the smaller, the better.
3. Minimization via Gradient Descent
We want to adjust w to minimize the loss.
So we compute the gradient (i.e., slope of the loss function w.r.t w):
d / dw[(w ⋅ x – y)^2] = 2(w ⋅ x – y) ⋅ x
This derivative tells us: How fast the loss increases or decreases if we change w a little.
4. Weight Update Rule
We adjust the weight in the opposite direction of the gradient:
w = w – η ⋅ dLoss / dw
Where:
- η (eta) is the learning rate — a small number controlling step size.
Summary of Math Concepts Needed:
Concept | Description |
---|---|
Linear Equation | y^ = w ⋅ x |
Squared Error Loss | (y^ – y)^2 |
Derivative (Gradient) | To measure how weight affects loss |
Gradient Descent Update | w = w – η ⋅ gradient |
Optional, but Helpful:
- Understanding functions and slopes (from calculus)
- Chain rule (if we go deeper into neural networks with multiple layers)
- Intuition of convex functions — why squared loss gives us a single minimum