Categorical Cross-Entropy example with Simple Python
We’ll simulate a simple version of categorical cross-entropy loss for 3 output classes.
import math # Ground truth (one-hot encoded) for class 0 actual = [1, 0, 0] # Model prediction (probability distribution) predicted = [0.7, 0.2, 0.1] # Categorical Cross Entropy Formula def categorical_cross_entropy(actual, predicted): loss = 0 for i in range(len(actual)): if actual[i] == 1: loss = -math.log(predicted[i] + 1e-9) # Avoid log(0) return loss # Run it loss_value = categorical_cross_entropy(actual, predicted) print("Categorical Cross-Entropy Loss:", loss_value)
Try changing predicted values to see how confidence affects loss.
1. Use Case: Text Sentiment Classification
Input: A “review score” from 1 to 5
Target Output:
- 1–2 → Negative
- 3 → Neutral
- 4–5 → Positive
We simulate this numerically and not with real sentences, to focus on how the neural network and categorical cross-entropy work.
Python Code with Explanation
import math import random # STEP 1: Data Setup (Simulating reviews as numbers) # input: review score (scaled from 0 to 1) # output: one-hot [Negative, Neutral, Positive] data = [ (0.1, [1, 0, 0]), # Negative (0.3, [1, 0, 0]), # Negative (0.5, [0, 1, 0]), # Neutral (0.7, [0, 0, 1]), # Positive (0.9, [0, 0, 1]) # Positive ] # STEP 2: Initialize Weights and Biases (1 input → 3 outputs) weights = [random.uniform(-1, 1) for _ in range(3)] # 1 input node to 3 output classes biases = [0.0 for _ in range(3)] learning_rate = 0.1 # STEP 3: Softmax Activation def softmax(logits): exp_values = [math.exp(i) for i in logits] total = sum(exp_values) return [j / total for j in exp_values] # STEP 4: Categorical Cross-Entropy Loss def cross_entropy_loss(actual, predicted): loss = 0 for i in range(len(actual)): if actual[i] == 1: loss = -math.log(predicted[i] + 1e-9) return loss # STEP 5: Training Loop for epoch in range(100): total_loss = 0 for x, y_true in data: # Forward pass logits = [x * weights[i] + biases[i] for i in range(3)] y_pred = softmax(logits) # Loss calculation loss = cross_entropy_loss(y_true, y_pred) total_loss += loss # Backward pass (Gradient Descent) # ∂L/∂z = y_pred - y_true (for softmax + cross-entropy) for i in range(3): gradient = y_pred[i] - y_true[i] weights[i] -= learning_rate * gradient * x biases[i] -= learning_rate * gradient if epoch % 10 == 0: print(f"Epoch {epoch} — Loss: {total_loss:.4f}") # STEP 6: Testing print("\nFinal Predictions:") for x, _ in data: logits = [x * weights[i] + biases[i] for i in range(3)] y_pred = softmax(logits) print(f"Review Score: {x:.1f} → Sentiment Prediction: {y_pred}")
Explanation Summary:
Step | What’s Happening? |
---|---|
Data Setup | We simulate review scores (0–1) and label them as Negative/Neutral/Positive using one-hot encoding |
Weights & Biases | These are randomly initialized and adjusted during training |
Softmax | Converts raw scores (logits) into class probabilities |
Cross-Entropy Loss | Compares predicted probability to actual one-hot label; punishes wrong confident guesses |
Backpropagation | Calculates gradients (softmax output – actual), updates weights and biases |
Training Loop | Over 100 epochs, we reduce loss by improving weights |
Prediction | After training, we predict sentiment from review scores |
2. Extending the previous example to include actual text sentences so the reader can truly relate to the task.
We’ll simulate a Sentiment Analysis Neural Network using:
- Actual text like: “I loved this movie!”
- A handcrafted text-to-number feature extractor
- A Neural Network with 1 input → 3 outputs (Negative / Neutral / Positive) trained using Categorical Cross-Entropy
All in raw Python (no libraries).
Updated Use Case: Text Sentiment Classifier
Classes:
- Negative → [1, 0, 0]
- Neutral → [0, 1, 0]
- Positive → [0, 0, 1]
Simple Text to Number Mapping (Feature Engineering)
We’ll just count positive/negative keywords from sentence — primitive but interpretable.
Updated Python Program (With Comments)
import math import random # STEP 1: Sample Data (sentence, sentiment label) dataset = [ ("I hated this movie", [1, 0, 0]), # Negative ("This was a terrible experience", [1, 0, 0]), ("The movie was okay", [0, 1, 0]), # Neutral ("Not bad, not good", [0, 1, 0]), ("I loved this movie", [0, 0, 1]), # Positive ("What a fantastic film", [0, 0, 1]) ] # STEP 2: Vocabulary for primitive feature extraction positive_words = ["love", "loved", "great", "amazing", "fantastic", "good"] negative_words = ["hate", "hated", "terrible", "bad", "worst"] # STEP 3: Convert sentence to a simple numeric score def extract_score(sentence): words = sentence.lower().split() score = 0 for word in words: if word in positive_words: score += 1 if word in negative_words: score -= 1 return score / 3.0 # Normalize (rough scaling) # STEP 4: Initialize Weights and Biases (1 input → 3 outputs) weights = [random.uniform(-1, 1) for _ in range(3)] biases = [0.0, 0.0, 0.0] learning_rate = 0.1 # STEP 5: Softmax Function def softmax(z): exps = [math.exp(i) for i in z] total = sum(exps) return [j / total for j in exps] # STEP 6: Categorical Cross-Entropy def cross_entropy(actual, predicted): loss = 0 for i in range(len(actual)): if actual[i] == 1: loss = -math.log(predicted[i] + 1e-9) return loss # STEP 7: Training Loop for epoch in range(100): total_loss = 0 for sentence, label in dataset: x = extract_score(sentence) logits = [x * weights[i] + biases[i] for i in range(3)] probs = softmax(logits) loss = cross_entropy(label, probs) total_loss += loss # Backpropagation for i in range(3): gradient = probs[i] - label[i] weights[i] -= learning_rate * gradient * x biases[i] -= learning_rate * gradient if epoch % 10 == 0: print(f"Epoch {epoch} — Total Loss: {total_loss:.4f}") # STEP 8: Inference (Test on unseen data) test_sentences = [ "The movie was bad", "It was a decent watch", "I absolutely loved every part", "Worst acting ever", "Quite okay and boring" ] print("\nTest Predictions:") for sentence in test_sentences: x = extract_score(sentence) logits = [x * weights[i] + biases[i] for i in range(3)] probs = softmax(logits) label_index = probs.index(max(probs)) sentiment = ["Negative", "Neutral", "Positive"][label_index] print(f"\"{sentence}\" → {sentiment} (Confidence: {max(probs):.2f})")
What This Program Teaches
Part | What It Represents |
---|---|
sentence → score | A basic NLP embedding (just keyword scoring) |
weights & biases | A one-layer neural network |
softmax | Converts raw output into class probabilities |
cross_entropy | Measures how far predictions are from truth |
gradient update | Core of learning: adjust weights by feedback |
test phase | Try new sentences and predict sentiment |
Categorical Cross-Entropy relevancy in Neural Network – Basic Math Concepts