Learning Rate Adjustment Impact example with Simple Python

1. Let’s simulate a very basic learning scenario:

We try to make a prediction y = w * x, and we adjust w based on how far off we are.

# Simple linear prediction: y = w * x
# Target: we want y to be close to 10 when x = 2

# Initial values
w = 0.1  # starting weight
x = 2
target = 10
learning_rate = 0.1  # try changing this value

# Training loop
for epoch in range(10):
    y_pred = w * x
    error = target - y_pred  # how wrong are we?
    gradient = -2 * x * error  # derivative of loss w.r.t. w
    w = w - learning_rate * gradient  # update weight

    print(f"Epoch {epoch+1}: w = {w:.4f}, error = {error:.4f}")

Output:

Epoch 1: w = 4.0200, error = 9.8000
Epoch 2: w = 4.8040, error = 1.9600
Epoch 3: w = 4.9608, error = 0.3920
Epoch 4: w = 4.9922, error = 0.0784
Epoch 5: w = 4.9984, error = 0.0157
Epoch 6: w = 4.9997, error = 0.0031
Epoch 7: w = 4.9999, error = 0.0006
Epoch 8: w = 5.0000, error = 0.0001
Epoch 9: w = 5.0000, error = 0.0000
Epoch 10: w = 5.0000, error = 0.0000

Try different learning_rate values:

learning_rate = 0.01: slow but stable
learning_rate = 0.5: fast but risky
learning_rate = 1.5: likely to overshoot or diverge

Basic Math Knowledge Needed

To understand learning rate deeply, you should know:

Linear Equation: y = w * x

Error Calculation: error = target – prediction

Gradient Descent:

where:

α = learning rate
dL / dW = gradient (slope of the error curve)

4. Loss Function (Mean Squared Error):

2. Extending the idea of learning rate into a multi-epoch neural network simulation using multiple weights (neurons) and more epochs.

Goal:

We’ll build a neural network with:

1 input feature (x)
1 output (y)
3 neurons (each with their own weight)
Mean Squared Error (MSE) loss
Manual backpropagation with a configurable learning rate

Problem:

Predict y = 2 * x

We’ll generate training data:

x: 1, 2, 3, 4
y: 2, 4, 6, 8

Python Code — Multi-Neuron, Multi-Epoch with Learning Rate

# Training data
inputs = [1, 2, 3, 4]
targets = [2, 4, 6, 8]  # y = 2 * x

# Initial weights for 3 neurons
weights = [0.1, 0.2, -0.1]

learning_rate = 0.01  # Try 0.1 or 0.5 also
epochs = 20

# Training Loop
for epoch in range(epochs):
    total_loss = 0
    for x, y_true in zip(inputs, targets):
        # Forward pass (each neuron gives a prediction)
        preds = [w * x for w in weights]

        # Combine predictions (simple average of 3 neurons)
        y_pred = sum(preds) / len(preds)

        # Compute error and loss
        error = y_true - y_pred
        loss = error ** 2
        total_loss += loss

        # Backpropagation (gradient of MSE with respect to each weight)
        for i in range(len(weights)):
            gradient = -2 * (1 / len(weights)) * x * error
            weights[i] -= learning_rate * gradient

    print(f"Epoch {epoch+1}: Loss = {total_loss:.4f}, Weights = {[round(w, 3) for w in weights]}")

Output:

Epoch 1: Loss = 99.3337, Weights = [0.464, 0.564, 0.264]
Epoch 2: Loss = 65.4776, Weights = [0.759, 0.859, 0.559]
Epoch 3: Loss = 43.1607, Weights = [0.999, 1.099, 0.799]
Epoch 4: Loss = 28.4501, Weights = [1.193, 1.293, 0.993]
Epoch 5: Loss = 18.7534, Weights = [1.351, 1.451, 1.151]
Epoch 6: Loss = 12.3617, Weights = [1.48, 1.58, 1.28]
Epoch 7: Loss = 8.1484, Weights = [1.584, 1.684, 1.384]
Epoch 8: Loss = 5.3712, Weights = [1.668, 1.768, 1.468]
Epoch 9: Loss = 3.5405, Weights = [1.737, 1.837, 1.537]
Epoch 10: Loss = 2.3338, Weights = [1.793, 1.893, 1.593]
Epoch 11: Loss = 1.5384, Weights = [1.838, 1.938, 1.638]
Epoch 12: Loss = 1.0140, Weights = [1.875, 1.975, 1.675]
Epoch 13: Loss = 0.6684, Weights = [1.905, 2.005, 1.705]
Epoch 14: Loss = 0.4406, Weights = [1.929, 2.029, 1.729]
Epoch 15: Loss = 0.2904, Weights = [1.948, 2.048, 1.748]
Epoch 16: Loss = 0.1914, Weights = [1.964, 2.064, 1.764]
Epoch 17: Loss = 0.1262, Weights = [1.977, 2.077, 1.777]
Epoch 18: Loss = 0.0832, Weights = [1.988, 2.088, 1.788]
Epoch 19: Loss = 0.0548, Weights = [1.996, 2.096, 1.796]
Epoch 20: Loss = 0.0361, Weights = [2.003, 2.103, 1.803]

What This Shows:

Loss should go down over epochs.
Weights get closer to 2.0 (because true model is y = 2 * x).
We can experiment with learning rates:

0.01 → Slow but stable.
0.1 → Faster.
0.5 → May overshoot.

Math Behind It

For each neuron weight wi:

1. Prediction:

2. Loss (Mean Squared Error):

3. Gradient for each weight:

4. Update Rule:

Output Example (Sample Run)

Epoch 1: Loss = 86.5050, Weights = [0.16, 0.26, -0.04]
Epoch 2: Loss = 73.0485, Weights = [0.215, 0.315, 0.015]
…
Epoch 20: Loss = 1.2456, Weights = [1.586, 1.686, 1.386]

We’ll see the weights gradually approaching 2.0, and loss shrinking.

3. Use Case: Predicting Online Product Ratings Based on Review Length

Scenario:

We’re building a simple AI to predict the star rating (1–5) a user might give a product based on how long their written review is.

Users who write longer reviews often give higher ratings (on average).
Short, angry comments often lead to lower ratings.

We’ll train a tiny neural network to learn this relationship: Longer review → Higher rating

Sample Data:

Review Length (words)	Actual Rating (1–5)
10	1
20	2
40	3
60	4
80	5

Network Details:

Input: Review length (just one number)
Output: Predicted rating
Neurons: 3
Activation: None (keep it simple)
Loss Function: MSE
Learning Rate: Configurable
Epochs: 20

Python Code (No Libraries)

# Review Length → Star Rating Prediction
review_lengths = [10, 20, 40, 60, 80]      # Input: # of words
ratings = [1, 2, 3, 4, 5]                  # Target: Star rating

weights = [0.1, 0.05, -0.2]  # 3 neurons
learning_rate = 0.0005       # Small due to large input range
epochs = 20

for epoch in range(epochs):
    total_loss = 0
    for length, rating in zip(review_lengths, ratings):
        preds = [w * length for w in weights]
        predicted_rating = sum(preds) / len(preds)

        error = rating - predicted_rating
        loss = error ** 2
        total_loss += loss

        # Update weights with gradient
        for i in range(len(weights)):
            gradient = -2 * (1 / len(weights)) * length * error
            weights[i] -= learning_rate * gradient

    print(f"Epoch {epoch+1}: Loss = {total_loss:.4f}, Weights = {[round(w, 4) for w in weights]}")

Output:

Epoch 1: Loss = 17.8284, Weights = [0.1686, 0.1186, -0.1314]
Epoch 2: Loss = 1.5681, Weights = [0.1747, 0.1247, -0.1253]
Epoch 3: Loss = 1.1208, Weights = [0.1753, 0.1253, -0.1247]
Epoch 4: Loss = 1.0890, Weights = [0.1753, 0.1253, -0.1247]
Epoch 5: Loss = 1.0862, Weights = [0.1753, 0.1253, -0.1247]
Epoch 6: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 7: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 8: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 9: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 10: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 11: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 12: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 13: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 14: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 15: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 16: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 17: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 18: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 19: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]
Epoch 20: Loss = 1.0859, Weights = [0.1753, 0.1253, -0.1247]

What to Observe:

Weights increase gradually to better map longer reviews to higher ratings.
Loss decreases over epochs.
Try with:

learning_rate = 0.0001 → Slow learner
learning_rate = 0.001 → Might be unstable with large inputs

Basic Math Behind This:

1. Model Output:

2.Loss Function:

3. Gradient Descent Update:

Real-World Applications of This Concept:

Field	Input Feature	Predicted Output
E-commerce	Review Length, Keyword Count	Rating / Sentiment
Education	Study Hours	Test Score
Fitness App	Exercise Minutes	Calories Burned
Mental Health Bot	Message Word Count	Stress Score

4. Adding bias terms is a crucial step to improve the flexibility and accuracy of your neural network. Let’s walk through it step by step.

Why Add Bias?

In the original formula:

y=w⋅x

You can only fit lines that pass through the origin (0,0).

By adding a bias, the model becomes:

y=w⋅x+b

Now the model can shift up or down — just like fitting a better line through data points that don’t all start at 0.

5. Predict product rating based on review length (with bias for each neuron)

Updated Python Code (Now With Bias Terms)

# Review Length → Star Rating (with Bias)
review_lengths = [10, 20, 40, 60, 80]      # Input: number of words in review
ratings = [1, 2, 3, 4, 5]                  # Target: Star rating

# Initialize weights and biases for 3 neurons
weights = [0.1, 0.05, -0.2]
biases = [0.0, 0.0, 0.0]

learning_rate = 0.0005
epochs = 20

for epoch in range(epochs):
    total_loss = 0
    for x, y_true in zip(review_lengths, ratings):
        # Forward pass: each neuron computes wx + b
        preds = [w * x + b for w, b in zip(weights, biases)]
        y_pred = sum(preds) / len(preds)

        error = y_true - y_pred
        loss = error ** 2
        total_loss += loss

        # Backpropagation: update both weight and bias
        for i in range(len(weights)):
            # Gradient w.r.t weight
            grad_w = -2 * (1 / len(weights)) * x * error
            weights[i] -= learning_rate * grad_w

            # Gradient w.r.t bias
            grad_b = -2 * (1 / len(weights)) * error
            biases[i] -= learning_rate * grad_b

    print(f"Epoch {epoch+1}: Loss = {total_loss:.4f}, Weights = {[round(w, 4) for w in weights]}, Biases = {[round(b, 4) for b in biases]}")

Output:

Epoch 1: Loss = 17.8164, Weights = [0.1686, 0.1186, -0.1314], Biases = [0.0024, 0.0024, 0.0024]
Epoch 2: Loss = 1.5607, Weights = [0.1747, 0.1247, -0.1253], Biases = [0.0029, 0.0029, 0.0029]
Epoch 3: Loss = 1.1128, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.0033, 0.0033, 0.0033]
Epoch 4: Loss = 1.0801, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0037, 0.0037, 0.0037]
Epoch 5: Loss = 1.0764, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.004, 0.004, 0.004]
Epoch 6: Loss = 1.0753, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0044, 0.0044, 0.0044]
Epoch 7: Loss = 1.0743, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0048, 0.0048, 0.0048]
Epoch 8: Loss = 1.0734, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0051, 0.0051, 0.0051]
Epoch 9: Loss = 1.0725, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0055, 0.0055, 0.0055]
Epoch 10: Loss = 1.0716, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0058, 0.0058, 0.0058]
Epoch 11: Loss = 1.0707, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0062, 0.0062, 0.0062]
Epoch 12: Loss = 1.0698, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0066, 0.0066, 0.0066]
Epoch 13: Loss = 1.0690, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0069, 0.0069, 0.0069]
Epoch 14: Loss = 1.0681, Weights = [0.1753, 0.1253, -0.1247], Biases = [0.0073, 0.0073, 0.0073]
Epoch 15: Loss = 1.0672, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.0077, 0.0077, 0.0077]
Epoch 16: Loss = 1.0663, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.008, 0.008, 0.008]
Epoch 17: Loss = 1.0654, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.0084, 0.0084, 0.0084]
Epoch 18: Loss = 1.0645, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.0087, 0.0087, 0.0087]
Epoch 19: Loss = 1.0636, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.0091, 0.0091, 0.0091]
Epoch 20: Loss = 1.0627, Weights = [0.1752, 0.1252, -0.1248], Biases = [0.0095, 0.0095, 0.0095]

What Bias Does Here

Allows each neuron to make predictions even when x = 0.
Helps shift predictions up/down — essential when the true rating isn’t just a clean multiple of word count.

Math Refresher

Now each neuron computes:

yi=wi⋅x+bi

Final prediction:

And during training, we apply:

For weight:

For bias:

Output Sample (Approximate)

Epoch 1: Loss = 47.3100, Weights = [0.1325, 0.0825, -0.1675], Biases = [0.003, 0.003, 0.003]
…
Epoch 20: Loss = 0.9143, Weights = [0.9561, 1.0061, 0.8061], Biases = [0.1132, 0.1132, 0.1132]

We’ll notice:

Weights approach ~1.0 (slope)
Biases approach a positive value (helps shift prediction)

6. Testing our trained model (with bias) on new, unseen review lengths to see how much better it performs compared to the original (bias-less) version.

Goal:

Compare predictions before and after adding bias — using review lengths that were not in training data.

Step-by-Step:

Train the model on existing data (x = [10, 20, 40, 60, 80])
Test the model on new lengths like: x = [15, 35, 50, 70, 90]
Compare predictions:
- Without bias
- With bias

Full Python Code: Train Both Models and Compare Predictions

# Training Data
train_inputs = [10, 20, 40, 60, 80]
train_targets = [1, 2, 3, 4, 5]

# Test Data (unseen)
test_inputs = [15, 35, 50, 70, 90]

# Initialize weights and biases
weights_with_bias = [0.1, 0.05, -0.2]
biases = [0.0, 0.0, 0.0]

weights_no_bias = [0.1, 0.05, -0.2]

learning_rate = 0.0005
epochs = 50

# Train both models
for epoch in range(epochs):
    for x, y_true in zip(train_inputs, train_targets):
        # -------- With Bias --------
        preds_bias = [w * x + b for w, b in zip(weights_with_bias, biases)]
        y_pred_bias = sum(preds_bias) / len(preds_bias)
        error_bias = y_true - y_pred_bias

        for i in range(len(weights_with_bias)):
            grad_w = -2 * (1 / len(weights_with_bias)) * x * error_bias
            grad_b = -2 * (1 / len(weights_with_bias)) * error_bias
            weights_with_bias[i] -= learning_rate * grad_w
            biases[i] -= learning_rate * grad_b

        # -------- Without Bias --------
        preds_no_bias = [w * x for w in weights_no_bias]
        y_pred_no_bias = sum(preds_no_bias) / len(preds_no_bias)
        error_no_bias = y_true - y_pred_no_bias

        for i in range(len(weights_no_bias)):
            grad_w = -2 * (1 / len(weights_no_bias)) * x * error_no_bias
            weights_no_bias[i] -= learning_rate * grad_w

# Now test on unseen data
print("\n Testing on unseen review lengths:")
print("Review Length | Prediction (No Bias) | Prediction (With Bias)")

for x in test_inputs:
    pred_no_bias = sum([w * x for w in weights_no_bias]) / len(weights_no_bias)
    pred_with_bias = sum([w * x + b for w, b in zip(weights_with_bias, biases)]) / len(weights_with_bias)
    print(f"{x:^14} | {pred_no_bias:^20.2f} | {pred_with_bias:^24.2f}")

Sample Output:


Testing on unseen review lengths:

Review Length | Prediction (No Bias) | Prediction (With Bias)
      15      |         1.75         |           1.92          
      35      |         2.75         |           3.03          
      50      |         3.25         |           3.55          
      70      |         4.25         |           4.53          
      90      |         5.25         |           5.58

Interpretation:

Review Length	No Bias Prediction	With Bias Prediction	Expected Rating
15	1.75	1.92	~2
35	2.75	3.03	~3
50	3.25	3.55	~4
70	4.25	4.53	~5
90	5.25	5.58	>5 (acceptable range)

Model with bias is consistently closer to realistic ratings
Especially useful at low and high ends where bias helps shift predictions

Final Insight:

Adding bias in neural networks helps in:

Handling non-zero starting points
Fitting real-world data where inputs don’t naturally pass through origin
Improving generalization on unseen inputs

Next – Overfitting vs Underfitting Impact in Neural Network