Categorical Cross-Entropy example with Simple Python
We’ll simulate a simple version of categorical cross-entropy loss for 3 output classes.
import math
# Ground truth (one-hot encoded) for class 0
actual = [1, 0, 0]
# Model prediction (probability distribution)
predicted = [0.7, 0.2, 0.1]
# Categorical Cross Entropy Formula
def categorical_cross_entropy(actual, predicted):
loss = 0
for i in range(len(actual)):
if actual[i] == 1:
loss = -math.log(predicted[i] + 1e-9) # Avoid log(0)
return loss
# Run it
loss_value = categorical_cross_entropy(actual, predicted)
print("Categorical Cross-Entropy Loss:", loss_value)
Try changing predicted values to see how confidence affects loss.
1. Use Case: Text Sentiment Classification
Input: A “review score” from 1 to 5
Target Output:
- 1–2 → Negative
- 3 → Neutral
- 4–5 → Positive
We simulate this numerically and not with real sentences, to focus on how the neural network and categorical cross-entropy work.
Python Code with Explanation
import math
import random
# STEP 1: Data Setup (Simulating reviews as numbers)
# input: review score (scaled from 0 to 1)
# output: one-hot [Negative, Neutral, Positive]
data = [
(0.1, [1, 0, 0]), # Negative
(0.3, [1, 0, 0]), # Negative
(0.5, [0, 1, 0]), # Neutral
(0.7, [0, 0, 1]), # Positive
(0.9, [0, 0, 1]) # Positive
]
# STEP 2: Initialize Weights and Biases (1 input → 3 outputs)
weights = [random.uniform(-1, 1) for _ in range(3)] # 1 input node to 3 output classes
biases = [0.0 for _ in range(3)]
learning_rate = 0.1
# STEP 3: Softmax Activation
def softmax(logits):
exp_values = [math.exp(i) for i in logits]
total = sum(exp_values)
return [j / total for j in exp_values]
# STEP 4: Categorical Cross-Entropy Loss
def cross_entropy_loss(actual, predicted):
loss = 0
for i in range(len(actual)):
if actual[i] == 1:
loss = -math.log(predicted[i] + 1e-9)
return loss
# STEP 5: Training Loop
for epoch in range(100):
total_loss = 0
for x, y_true in data:
# Forward pass
logits = [x * weights[i] + biases[i] for i in range(3)]
y_pred = softmax(logits)
# Loss calculation
loss = cross_entropy_loss(y_true, y_pred)
total_loss += loss
# Backward pass (Gradient Descent)
# ∂L/∂z = y_pred - y_true (for softmax + cross-entropy)
for i in range(3):
gradient = y_pred[i] - y_true[i]
weights[i] -= learning_rate * gradient * x
biases[i] -= learning_rate * gradient
if epoch % 10 == 0:
print(f"Epoch {epoch} — Loss: {total_loss:.4f}")
# STEP 6: Testing
print("\nFinal Predictions:")
for x, _ in data:
logits = [x * weights[i] + biases[i] for i in range(3)]
y_pred = softmax(logits)
print(f"Review Score: {x:.1f} → Sentiment Prediction: {y_pred}")
Explanation Summary:
| Step | What’s Happening? |
|---|---|
| Data Setup | We simulate review scores (0–1) and label them as Negative/Neutral/Positive using one-hot encoding |
| Weights & Biases | These are randomly initialized and adjusted during training |
| Softmax | Converts raw scores (logits) into class probabilities |
| Cross-Entropy Loss | Compares predicted probability to actual one-hot label; punishes wrong confident guesses |
| Backpropagation | Calculates gradients (softmax output – actual), updates weights and biases |
| Training Loop | Over 100 epochs, we reduce loss by improving weights |
| Prediction | After training, we predict sentiment from review scores |
2. Extending the previous example to include actual text sentences so the reader can truly relate to the task.
We’ll simulate a Sentiment Analysis Neural Network using:
- Actual text like: “I loved this movie!”
- A handcrafted text-to-number feature extractor
- A Neural Network with 1 input → 3 outputs (Negative / Neutral / Positive) trained using Categorical Cross-Entropy
All in raw Python (no libraries).
Updated Use Case: Text Sentiment Classifier
Classes:
- Negative → [1, 0, 0]
- Neutral → [0, 1, 0]
- Positive → [0, 0, 1]
Simple Text to Number Mapping (Feature Engineering)
We’ll just count positive/negative keywords from sentence — primitive but interpretable.
Updated Python Program (With Comments)
import math
import random
# STEP 1: Sample Data (sentence, sentiment label)
dataset = [
("I hated this movie", [1, 0, 0]), # Negative
("This was a terrible experience", [1, 0, 0]),
("The movie was okay", [0, 1, 0]), # Neutral
("Not bad, not good", [0, 1, 0]),
("I loved this movie", [0, 0, 1]), # Positive
("What a fantastic film", [0, 0, 1])
]
# STEP 2: Vocabulary for primitive feature extraction
positive_words = ["love", "loved", "great", "amazing", "fantastic", "good"]
negative_words = ["hate", "hated", "terrible", "bad", "worst"]
# STEP 3: Convert sentence to a simple numeric score
def extract_score(sentence):
words = sentence.lower().split()
score = 0
for word in words:
if word in positive_words:
score += 1
if word in negative_words:
score -= 1
return score / 3.0 # Normalize (rough scaling)
# STEP 4: Initialize Weights and Biases (1 input → 3 outputs)
weights = [random.uniform(-1, 1) for _ in range(3)]
biases = [0.0, 0.0, 0.0]
learning_rate = 0.1
# STEP 5: Softmax Function
def softmax(z):
exps = [math.exp(i) for i in z]
total = sum(exps)
return [j / total for j in exps]
# STEP 6: Categorical Cross-Entropy
def cross_entropy(actual, predicted):
loss = 0
for i in range(len(actual)):
if actual[i] == 1:
loss = -math.log(predicted[i] + 1e-9)
return loss
# STEP 7: Training Loop
for epoch in range(100):
total_loss = 0
for sentence, label in dataset:
x = extract_score(sentence)
logits = [x * weights[i] + biases[i] for i in range(3)]
probs = softmax(logits)
loss = cross_entropy(label, probs)
total_loss += loss
# Backpropagation
for i in range(3):
gradient = probs[i] - label[i]
weights[i] -= learning_rate * gradient * x
biases[i] -= learning_rate * gradient
if epoch % 10 == 0:
print(f"Epoch {epoch} — Total Loss: {total_loss:.4f}")
# STEP 8: Inference (Test on unseen data)
test_sentences = [
"The movie was bad",
"It was a decent watch",
"I absolutely loved every part",
"Worst acting ever",
"Quite okay and boring"
]
print("\nTest Predictions:")
for sentence in test_sentences:
x = extract_score(sentence)
logits = [x * weights[i] + biases[i] for i in range(3)]
probs = softmax(logits)
label_index = probs.index(max(probs))
sentiment = ["Negative", "Neutral", "Positive"][label_index]
print(f"\"{sentence}\" → {sentiment} (Confidence: {max(probs):.2f})")
What This Program Teaches
| Part | What It Represents |
|---|---|
| sentence → score | A basic NLP embedding (just keyword scoring) |
| weights & biases | A one-layer neural network |
| softmax | Converts raw output into class probabilities |
| cross_entropy | Measures how far predictions are from truth |
| gradient update | Core of learning: adjust weights by feedback |
| test phase | Try new sentences and predict sentiment |
Categorical Cross-Entropy relevancy in Neural Network – Basic Math Concepts
