Basic Math Concepts – Minimize objective in Neural Network

1. The Model Function

We assume a simple linear model:

y^ = w ⋅ x

Where:

  • x is the input,
  • w is the weight (what we want to learn),
  • y^ is the predicted output.

2. Loss Function: Mean Squared Error (MSE)

To know how “wrong” our prediction is, we use:

Loss = (y^ – y)^2 = (w ⋅ x – y)^2

This gives a non-negative number representing error — the smaller, the better.

3. Minimization via Gradient Descent

We want to adjust w to minimize the loss.
So we compute the gradient (i.e., slope of the loss function w.r.t w):

d / dw[(w ⋅ x – y)^2] = 2(w ⋅ x – y) ⋅ x

This derivative tells us: How fast the loss increases or decreases if we change w a little.

4. Weight Update Rule

We adjust the weight in the opposite direction of the gradient:

w = w – η ⋅ dLoss / dw

Where:

  • η (eta) is the learning rate — a small number controlling step size.

Summary of Math Concepts Needed:

Concept Description
Linear Equation y^ = w ⋅ x
Squared Error Loss (y^ – y)^2
Derivative (Gradient) To measure how weight affects loss
Gradient Descent Update w = w – η ⋅ gradient

Optional, but Helpful:

  • Understanding functions and slopes (from calculus)
  • Chain rule (if we go deeper into neural networks with multiple layers)
  • Intuition of convex functions — why squared loss gives us a single minimum

Next – Gradient Descent in Neural Network