Backpropagation(Multiple Neurons) example with Simple Python

1. A simple Python simulation of backpropagation with multiple neurons

Setup

2 input neurons
1 hidden layer with 2 neurons
1 output neuron
Sigmoid activation
Mean Squared Error loss

Python Code (No Libraries)

import math

# Sigmoid and its derivative
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

# Inputs and target
x1, x2 = 0.5, 0.1
y_actual = 1

# Initial weights and biases (randomly chosen)
w1, w2 = 0.4, 0.3   # weights for hidden neuron 1
w3, w4 = 0.6, 0.1   # weights for hidden neuron 2
w5, w6 = 0.5, 0.7   # weights from hidden to output

b1, b2 = 0.0, 0.0   # biases for hidden neurons
b3 = 0.0            # bias for output

lr = 0.1  # learning rate

# ----- Forward Pass -----
h1_input = w1*x1 + w2*x2 + b1
h1_output = sigmoid(h1_input)

h2_input = w3*x1 + w4*x2 + b2
h2_output = sigmoid(h2_input)

output_input = w5*h1_output + w6*h2_output + b3
y_pred = sigmoid(output_input)

# ----- Calculate Error -----
error = (y_pred - y_actual) ** 2
print("Initial Error:", round(error, 4))

# ----- Backward Pass -----
# Output layer gradients
dE_dypred = 2 * (y_pred - y_actual)
dypred_dz = sigmoid_derivative(output_input)

# Gradients for weights w5, w6
dz_dw5 = h1_output
dz_dw6 = h2_output

# Chain rule
dw5 = dE_dypred * dypred_dz * dz_dw5
dw6 = dE_dypred * dypred_dz * dz_dw6
db3 = dE_dypred * dypred_dz

# Gradients for hidden neurons
# h1
dout_dh1 = w5 * dE_dypred * dypred_dz
dh1_dz1 = sigmoid_derivative(h1_input)
dw1 = dout_dh1 * dh1_dz1 * x1
dw2 = dout_dh1 * dh1_dz1 * x2
db1 = dout_dh1 * dh1_dz1

# h2
dout_dh2 = w6 * dE_dypred * dypred_dz
dh2_dz2 = sigmoid_derivative(h2_input)
dw3 = dout_dh2 * dh2_dz2 * x1
dw4 = dout_dh2 * dh2_dz2 * x2
db2 = dout_dh2 * dh2_dz2

# ----- Update Weights -----
w1 -= lr * dw1
w2 -= lr * dw2
w3 -= lr * dw3
w4 -= lr * dw4
w5 -= lr * dw5
w6 -= lr * dw6
b1 -= lr * db1
b2 -= lr * db2
b3 -= lr * db3

# ----- Forward again to check improvement -----
h1_input = w1*x1 + w2*x2 + b1
h1_output = sigmoid(h1_input)

h2_input = w3*x1 + w4*x2 + b2
h2_output = sigmoid(h2_input)

output_input = w5*h1_output + w6*h2_output + b3
y_pred = sigmoid(output_input)

error = (y_pred - y_actual) ** 2
print("Error after 1 backpropagation:", round(error, 4))

Output Sample

Initial Error: 0.1841
Error after 1 backpropagation: 0.1768

Python Code – Multiple Epochs (No Libraries)

import math

# Activation and its derivative
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

# Inputs and actual output
x1, x2 = 0.5, 0.1
y_actual = 1

# Initial weights and biases
w1, w2 = 0.4, 0.3
w3, w4 = 0.6, 0.1
w5, w6 = 0.5, 0.7
b1, b2, b3 = 0.0, 0.0, 0.0

lr = 0.1  # Learning rate
epochs = 50

# Training loop
for epoch in range(epochs):
    # Forward Pass
    h1_input = w1*x1 + w2*x2 + b1
    h1_output = sigmoid(h1_input)

    h2_input = w3*x1 + w4*x2 + b2
    h2_output = sigmoid(h2_input)

    output_input = w5*h1_output + w6*h2_output + b3
    y_pred = sigmoid(output_input)

    # Error
    error = (y_pred - y_actual) ** 2

    # Backward Pass
    dE_dypred = 2 * (y_pred - y_actual)
    dypred_dz = sigmoid_derivative(output_input)

    dw5 = dE_dypred * dypred_dz * h1_output
    dw6 = dE_dypred * dypred_dz * h2_output
    db3 = dE_dypred * dypred_dz

    # Gradients for hidden layer
    dout_dh1 = w5 * dE_dypred * dypred_dz
    dh1_dz1 = sigmoid_derivative(h1_input)
    dw1 = dout_dh1 * dh1_dz1 * x1
    dw2 = dout_dh1 * dh1_dz1 * x2
    db1 = dout_dh1 * dh1_dz1

    dout_dh2 = w6 * dE_dypred * dypred_dz
    dh2_dz2 = sigmoid_derivative(h2_input)
    dw3 = dout_dh2 * dh2_dz2 * x1
    dw4 = dout_dh2 * dh2_dz2 * x2
    db2 = dout_dh2 * dh2_dz2

    # Update weights and biases
    w1 -= lr * dw1
    w2 -= lr * dw2
    w3 -= lr * dw3
    w4 -= lr * dw4
    w5 -= lr * dw5
    w6 -= lr * dw6
    b1 -= lr * db1
    b2 -= lr * db2
    b3 -= lr * db3

    # Print progress
    print(f"Epoch {epoch+1}: y_pred = {round(y_pred, 4)}, error = {round(error, 6)}")

What We’ll See

Each epoch prints:

The current predicted output
The error at that stage

We’ll observe:

y_pred moves closer to y_actual = 1

Error gradually decreases

Backpropagation with Multiple Neurons – Visual Roadmap