Bucketing example with simple python

1. We’ll show how to:

Create a list of input sentences
Convert them to numerical encoding (simplified)
Bucket them by length ranges
Pad each to max in its bucket

# Step 1: Sample data (sentence tokens)
sentences = [
    ["Hi"],
    ["Hello", "friend"],
    ["Good", "morning", "everyone"],
    ["Welcome", "to", "the", "AI", "session"],
    ["This", "is", "an", "example", "of", "bucketing"]
]

# Step 2: Simple encoding (word to integer ID)
vocab = {}
counter = 1
for sentence in sentences:
    for word in sentence:
        if word not in vocab:
            vocab[word] = counter
            counter += 1

# Convert words to numeric tokens
encoded_sentences = [[vocab[word] for word in sentence] for sentence in sentences]

# Step 3: Bucketing
buckets = {
    "1-2": [],
    "3-5": [],
    "6+": []
}

for sentence in encoded_sentences:
    l = len(sentence)
    if l <= 2:
        buckets["1-2"].append(sentence)
    elif l <= 5:
        buckets["3-5"].append(sentence)
    else:
        buckets["6+"].append(sentence)

# Step 4: Padding each bucket
def pad_sentences(bucket):
    max_len = max(len(s) for s in bucket)
    padded = [s + [0]*(max_len - len(s)) for s in bucket]
    return padded

# Step 5: Display padded buckets
for label, bucket in buckets.items():
    if bucket:
        padded = pad_sentences(bucket)
        print(f"\nBucket {label} (max length {len(padded[0])}):")
        for p in padded:
            print(p)

Bucket 1-2 (max length 2):
[1, 0]
[2, 3]

Bucket 3-5 (max length 5):
[4, 5, 6, 0, 0]
[7, 8, 9, 10, 11]

Bucket 6+ (max length 6):
[12, 13, 14, 15, 16, 17]

2. Connect bucketing to a neural network training loop with a real-life impact example

Real-Life Use Case: Customer Support Chatbot

Imagine we’re building an AI-powered customer support chatbot.

It receives user queries of different lengths:

“Hi”
“I need help with my internet connection”
“My account has been suspended after I moved to another country”

Without bucketing, we’d pad all queries to the maximum length (e.g., 20 tokens), which wastes memory and slows training.

With bucketing, we:

Reduce wasted padding,
Use efficient training batches,
Preserve contextual flow better.

Neural Network Training (Simplified)

We’ll simulate:

A mini neural network that predicts intent (like “greeting”, “complaint”, etc.)
Trained using bucketed padded inputs
With step-by-step explanation

Step-by-Step Pure Python Simulation

Setup

We’ll:

Define fake intents (labels)
Bucket and pad as before
Train a toy neural net (dot-product + sigmoid) using gradient updates

Step 1: Data

# Buckets from earlier
sentences = [
    ["Hi"],  # Greeting
    ["Hello", "friend"],  # Greeting
    ["Good", "morning", "everyone"],  # Greeting
    ["Welcome", "to", "the", "AI", "session"],  # Event
    ["This", "is", "an", "example", "of", "bucketing"]  # Technical
]
labels = [0, 0, 0, 1, 2]  # 0: Greeting, 1: Event, 2: Technical

# Encoding
vocab = {}
counter = 1
for sent in sentences:
    for word in sent:
        if word not in vocab:
            vocab[word] = counter
            counter += 1
encoded = [[vocab[word] for word in sent] for sent in sentences]

Step 2: Bucketing + Padding

def bucket_data(inputs, labels):
    buckets = {"1-2": [], "3-5": [], "6+": []}
    label_buckets = {"1-2": [], "3-5": [], "6+": []}

    for i, sent in enumerate(inputs):
        l = len(sent)
        if l <= 2:
            k = "1-2"
        elif l <= 5:
            k = "3-5"
        else:
            k = "6+"
        buckets[k].append(sent)
        label_buckets[k].append(labels[i])
    return buckets, label_buckets

def pad_sentences(bucket):
    max_len = max(len(s) for s in bucket)
    return [s + [0] * (max_len - len(s)) for s in bucket]

buckets, label_buckets = bucket_data(encoded, labels)

Step 3: Mini Neural Net (pure Python)

We’ll simulate:

Input layer → weights → output layer
Dot product for simplicity
Sigmoid activation
Gradient descent for weight update


import random
import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def sigmoid_deriv(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

# Initialize weights (same length as input vector)
def init_weights(input_len, num_classes):
    return [[random.uniform(-0.1, 0.1) for _ in range(input_len)] for _ in range(num_classes)]

# Train per bucket
def train_bucket(X, y, num_classes, epochs=50, lr=0.01):
    input_len = len(X[0])
    W = init_weights(input_len, num_classes)

    for epoch in range(epochs):
        for xi, yi in zip(X, y):
            # Forward pass
            logits = [sum(w*x for w,x in zip(W[class_i], xi)) for class_i in range(num_classes)]
            preds = [sigmoid(logit) for logit in logits]

            # Compute error and backprop
            for class_i in range(num_classes):
                target = 1 if yi == class_i else 0
                error = preds[class_i] - target
                grad = [error * sigmoid_deriv(logits[class_i]) * x for x in xi]
                # Update weights
                W[class_i] = [w - lr * g for w, g in zip(W[class_i], grad)]

    return W

# Training each bucket
trained_weights = {}
num_classes = 3  # Greeting, Event, Technical

for bucket_key in buckets:
    if buckets[bucket_key]:
        X_pad = pad_sentences(buckets[bucket_key])
        trained_weights[bucket_key] = train_bucket(X_pad, label_buckets[bucket_key], num_classes)

Step 4: Predict from Trained Buckets

def predict(x, W):
    logits = [sum(w * xi for w, xi in zip(w_row, x)) for w_row in W]
    preds = [sigmoid(l) for l in logits]
    return preds.index(max(preds))  # Return class with highest probability

# Example test prediction
test_sent = ["Hello", "there"]
encoded_test = [vocab.get(w, 0) for w in test_sent]
bucket_key = "1-2" if len(encoded_test) <= 2 else "3-5" if len(encoded_test) <= 5 else "6+"
max_len = len(trained_weights[bucket_key][0])
test_pad = encoded_test + [0] * (max_len - len(encoded_test))
pred_class = predict(test_pad, trained_weights[bucket_key])
print(f"Predicted class: {pred_class} → {'Greeting' if pred_class==0 else 'Event' if pred_class==1 else 'Technical'}")

Next – Normalization in Neural Networks