Categorical Cross-Entropy example with Simple Python

We’ll simulate a simple version of categorical cross-entropy loss for 3 output classes.

import math

# Ground truth (one-hot encoded) for class 0
actual = [1, 0, 0]

# Model prediction (probability distribution)
predicted = [0.7, 0.2, 0.1]

# Categorical Cross Entropy Formula
def categorical_cross_entropy(actual, predicted):
    loss = 0
    for i in range(len(actual)):
        if actual[i] == 1:
            loss = -math.log(predicted[i] + 1e-9)  # Avoid log(0)
    return loss

# Run it
loss_value = categorical_cross_entropy(actual, predicted)
print("Categorical Cross-Entropy Loss:", loss_value)

Try changing predicted values to see how confidence affects loss.

1. Use Case: Text Sentiment Classification

Input: A “review score” from 1 to 5
Target Output:

  • 1–2 → Negative
  • 3 → Neutral
  • 4–5 → Positive

We simulate this numerically and not with real sentences, to focus on how the neural network and categorical cross-entropy work.

Python Code with Explanation

import math
import random

# STEP 1: Data Setup (Simulating reviews as numbers)
# input: review score (scaled from 0 to 1)
# output: one-hot [Negative, Neutral, Positive]

data = [
    (0.1, [1, 0, 0]),  # Negative
    (0.3, [1, 0, 0]),  # Negative
    (0.5, [0, 1, 0]),  # Neutral
    (0.7, [0, 0, 1]),  # Positive
    (0.9, [0, 0, 1])   # Positive
]

# STEP 2: Initialize Weights and Biases (1 input → 3 outputs)
weights = [random.uniform(-1, 1) for _ in range(3)]  # 1 input node to 3 output classes
biases  = [0.0 for _ in range(3)]
learning_rate = 0.1

# STEP 3: Softmax Activation
def softmax(logits):
    exp_values = [math.exp(i) for i in logits]
    total = sum(exp_values)
    return [j / total for j in exp_values]

# STEP 4: Categorical Cross-Entropy Loss
def cross_entropy_loss(actual, predicted):
    loss = 0
    for i in range(len(actual)):
        if actual[i] == 1:
            loss = -math.log(predicted[i] + 1e-9)
    return loss

# STEP 5: Training Loop
for epoch in range(100):
    total_loss = 0

    for x, y_true in data:
        # Forward pass
        logits = [x * weights[i] + biases[i] for i in range(3)]
        y_pred = softmax(logits)

        # Loss calculation
        loss = cross_entropy_loss(y_true, y_pred)
        total_loss += loss

        # Backward pass (Gradient Descent)
        # ∂L/∂z = y_pred - y_true (for softmax + cross-entropy)
        for i in range(3):
            gradient = y_pred[i] - y_true[i]
            weights[i] -= learning_rate * gradient * x
            biases[i]  -= learning_rate * gradient

    if epoch % 10 == 0:
        print(f"Epoch {epoch} — Loss: {total_loss:.4f}")

# STEP 6: Testing
print("\nFinal Predictions:")
for x, _ in data:
    logits = [x * weights[i] + biases[i] for i in range(3)]
    y_pred = softmax(logits)
    print(f"Review Score: {x:.1f} → Sentiment Prediction: {y_pred}")

Explanation Summary:

Step What’s Happening?
Data Setup We simulate review scores (0–1) and label them as Negative/Neutral/Positive using one-hot encoding
Weights & Biases These are randomly initialized and adjusted during training
Softmax Converts raw scores (logits) into class probabilities
Cross-Entropy Loss Compares predicted probability to actual one-hot label; punishes wrong confident guesses
Backpropagation Calculates gradients (softmax output – actual), updates weights and biases
Training Loop Over 100 epochs, we reduce loss by improving weights
Prediction After training, we predict sentiment from review scores

2. Extending the previous example to include actual text sentences so the reader can truly relate to the task.

We’ll simulate a Sentiment Analysis Neural Network using:

  • Actual text like: “I loved this movie!”
  • A handcrafted text-to-number feature extractor
  • A Neural Network with 1 input → 3 outputs (Negative / Neutral / Positive) trained using Categorical Cross-Entropy

All in raw Python (no libraries).

Updated Use Case: Text Sentiment Classifier
Classes:

  • Negative → [1, 0, 0]
  • Neutral → [0, 1, 0]
  • Positive → [0, 0, 1]

Simple Text to Number Mapping (Feature Engineering)

We’ll just count positive/negative keywords from sentence — primitive but interpretable.

Updated Python Program (With Comments)

import math
import random

# STEP 1: Sample Data (sentence, sentiment label)
dataset = [
    ("I hated this movie", [1, 0, 0]),      # Negative
    ("This was a terrible experience", [1, 0, 0]),
    ("The movie was okay", [0, 1, 0]),      # Neutral
    ("Not bad, not good", [0, 1, 0]),
    ("I loved this movie", [0, 0, 1]),      # Positive
    ("What a fantastic film", [0, 0, 1])
]

# STEP 2: Vocabulary for primitive feature extraction
positive_words = ["love", "loved", "great", "amazing", "fantastic", "good"]
negative_words = ["hate", "hated", "terrible", "bad", "worst"]

# STEP 3: Convert sentence to a simple numeric score
def extract_score(sentence):
    words = sentence.lower().split()
    score = 0
    for word in words:
        if word in positive_words:
            score += 1
        if word in negative_words:
            score -= 1
    return score / 3.0  # Normalize (rough scaling)

# STEP 4: Initialize Weights and Biases (1 input → 3 outputs)
weights = [random.uniform(-1, 1) for _ in range(3)]
biases = [0.0, 0.0, 0.0]
learning_rate = 0.1

# STEP 5: Softmax Function
def softmax(z):
    exps = [math.exp(i) for i in z]
    total = sum(exps)
    return [j / total for j in exps]

# STEP 6: Categorical Cross-Entropy
def cross_entropy(actual, predicted):
    loss = 0
    for i in range(len(actual)):
        if actual[i] == 1:
            loss = -math.log(predicted[i] + 1e-9)
    return loss

# STEP 7: Training Loop
for epoch in range(100):
    total_loss = 0
    for sentence, label in dataset:
        x = extract_score(sentence)
        logits = [x * weights[i] + biases[i] for i in range(3)]
        probs = softmax(logits)
        loss = cross_entropy(label, probs)
        total_loss += loss

        # Backpropagation
        for i in range(3):
            gradient = probs[i] - label[i]
            weights[i] -= learning_rate * gradient * x
            biases[i]  -= learning_rate * gradient

    if epoch % 10 == 0:
        print(f"Epoch {epoch} — Total Loss: {total_loss:.4f}")

# STEP 8: Inference (Test on unseen data)
test_sentences = [
    "The movie was bad",
    "It was a decent watch",
    "I absolutely loved every part",
    "Worst acting ever",
    "Quite okay and boring"
]

print("\nTest Predictions:")
for sentence in test_sentences:
    x = extract_score(sentence)
    logits = [x * weights[i] + biases[i] for i in range(3)]
    probs = softmax(logits)
    label_index = probs.index(max(probs))
    sentiment = ["Negative", "Neutral", "Positive"][label_index]
    print(f"\"{sentence}\" → {sentiment} (Confidence: {max(probs):.2f})")

What This Program Teaches

Part What It Represents
sentence → score A basic NLP embedding (just keyword scoring)
weights & biases A one-layer neural network
softmax Converts raw output into class probabilities
cross_entropy Measures how far predictions are from truth
gradient update Core of learning: adjust weights by feedback
test phase Try new sentences and predict sentiment

Categorical Cross-Entropy relevancy in Neural Network – Basic Math Concepts