Deep Learning with Simple Python
We are simulating an XOR function (i.e., the neural network learns that 0 XOR 1 = 1, 1 XOR 1 = 0, etc.). XOR is not linearly separable, so:
- A shallow neural network (1 hidden layer) usually struggles or needs a lot of neurons.
- A deep neural network (2+ hidden layers) handles it more easily due to non-linear combinations.
1. Activation Function: Sigmoid
def sigmoid(x): return 1 / (1 + math.exp(-x)) def sigmoid_derivative(x): sx = sigmoid(x) return sx * (1 - sx)
Explanation:
- sigmoid(x) maps input into (0, 1) range → creates non-linearity
- Derivative is needed for backpropagation (learning step)
In deep learning, non-linearity + multiple layers help learn complex patterns like XOR.
2. The Data: XOR Truth Table
data = [ ([0, 0], 0), ([0, 1], 1), ([1, 0], 1), ([1, 1], 0) ]
This is the classic XOR problem, which shallow networks can’t solve without tricks.
3. Layer Initialization
def init_layer(input_size, output_size): return [[random.uniform(-1, 1) for _ in range(input_size)] for _ in range(output_size)], [random.uniform(-1, 1) for _ in range(output_size)]
Explanation:
- Randomly initializes weights and biases
- For example, from input (2 values) → hidden1 (3 neurons): you’ll get a matrix of size (3×2)
4. Forward Pass Function
def dot_product(weights, inputs, bias): return [sum(w * i for w, i in zip(weight, inputs)) + b for weight, b in zip(weights, bias)] def forward(inputs, weights1, bias1, weights2, bias2, weights3, bias3): z1 = dot_product(weights1, inputs, bias1) a1 = [sigmoid(x) for x in z1] z2 = dot_product(weights2, a1, bias2) a2 = [sigmoid(x) for x in z2] z3 = dot_product(weights3, a2, bias3) a3 = [sigmoid(x) for x in z3] return z1, a1, z2, a2, z3, a3
Explanation:
- Inputs flow through 3 layers:
- Layer 1: input → hidden1 (3 neurons)
- Layer 2: hidden1 → hidden2 (3 neurons)
- Layer 3: hidden2 → output (1 neuron)
- z = weighted sum + bias
- a = activation output (after sigmoid)
5. Backpropagation and Training Loop
def train(epochs=10000, lr=0.5): w1, b1 = init_layer(2, 3) w2, b2 = init_layer(3, 3) w3, b3 = init_layer(3, 1)
- We create three layers of weights and biases: input→hidden1, hidden1→hidden2, hidden2→output
6. Training Over Epochs
for epoch in range(epochs): total_error = 0 for x, y in data: z1, a1, z2, a2, z3, a3 = forward(x, w1, b1, w2, b2, w3, b3)
For every epoch (iteration), we go through all examples and calculate:
- z1, a1: first layer
- z2, a2: second layer
- z3, a3: final output
7. Error and Delta Computation
error = y - a3[0] delta3 = error * sigmoid_derivative(z3[0]) delta2 = [delta3 * w3[i][0] * sigmoid_derivative(z2[i]) for i in range(3)] delta1 = [sum(delta2[j] * w2[j][i] for j in range(3)) * sigmoid_derivative(z1[i]) for i in range(3)]
Explanation:
- Calculates error between prediction and actual
- delta3: How wrong the output neuron is
- delta2: How much each hidden2 neuron contributed to error
- delta1: Same logic for hidden1 neurons
This is classic backpropagation using chain rule
8. Updating Weights and Biases
for i in range(3): for j in range(3): w2[i][j] += lr * delta2[i] * a1[j] b2[i] += lr * delta2[i]
Update each weight by: new_weight = old_weight + learning_rate × error_contribution
Same done for w1, w2, and w3 weights
9. Final Prediction
print("\nFinal predictions:") for x, _ in data: _, _, _, _, _, a3 = forward(x, w1, b1, w2, b2, w3, b3) print(f"{x} → {round(a3[0], 3)}")
After training, we test each XOR input and print predicted value.
10. What If We Use Shallow Neural Network (1 Hidden Layer)?
If we change the code to remove one hidden layer, like this:
# Only one hidden layer w1, b1 = init_layer(2, 3) w2, b2 = init_layer(3, 1) # forward pass would skip z2, a2
We will observe:
- XOR problem doesn’t converge
- The model will oscillate around 0.5 for all inputs
- Because no non-linear intermediate representation is available to “flip” the pattern
10. Conceptual Summary: Deep vs Shallow for XOR
Point | Shallow NN | Deep NN |
---|---|---|
Layers | 1 hidden layer | 2+ hidden layers |
Can solve XOR? | Not without tricks | Easily |
Feature extraction | Weak | Hierarchical |
Real-world use | Only for linearly separable problems | All modern AI (e.g. GPT, image models) |
11. Real-Life Analogy: Mail Sorting in a Post Office
Imagine we’re running a modern post office and we want to separate spam emails from real emails:
Shallow Learning: One-Layer Classifier
We hire an assistant who looks at only the subject line of an email.
He’s trained like this:
- If email contains the word “lottery”, mark as spam
- If email contains “urgent”, mark as spam
- If email contains “meeting”, mark as real
But…
What goes wrong?
- “Meeting about lottery campaign” → Gets misclassified
- “URGENT! Team Lunch Today” → Gets marked spam even though it’s valid
Why shallow learning fails:
- Only works on simple rules
- Can’t understand context or deeper relationships
- Misses patterns in word combinations, email sender, tone, etc.
Deep Learning: Multi-Layer System
Now, you upgrade your post office with:
- Layer 1: Extract keywords and sender info
- Layer 2: Analyze tone, frequency of words, punctuation
- Layer 3: Determine meaning and intent (e.g., sarcasm, urgency)
- Final layer: Decide if it’s spam or not
Now it can correctly:
- Understand “lottery meeting for marketing” is not spam
- Detect “Hello friend, claim your prize” as spam due to pattern
- Learn from examples, not just rules
Now Let’s Map This to Code
We’ll simulate a very simplified version of spam detection, where we use a deep network to learn patterns across 2 features:
- contains_offer (0 or 1)
- is_from_known_sender (0 or 1)
Dataset (Real-World Like):
# [contains_offer, is_from_known_sender] → is_spam data = [ ([1, 0], 1), # Offer from unknown sender → spam ([0, 1], 0), # No offer from known sender → not spam ([1, 1], 0), # Offer from known sender → likely not spam ([0, 0], 1), # No offer, unknown sender → suspicious → spam ]
Real Interpretation of Each Layer:
- Layer 1: Finds basic features (e.g., offer, known sender)
- Layer 2: Combines features (e.g., “Offer + Known Sender”)
- Layer 3: Learns spam vs non-spam from combinations
What Happens Without Deep Layers?
With only one layer:
- The network might say:
- Offer = spam (blindly)
- Known sender = not spam (blindly)
→ It can’t learn the combo logic like:“Offer from a known sender is not necessarily spam”
Deep Learning Helps Because:
- It stacks multiple decision points
- Allows the network to build abstract ideas, like:
- “trustworthiness”
- “intended tone”
- “frequency of spammy words”
Deep Learning with Neural Networks – Basic Math Concepts