Feature Engineering example with simple python

1. Goal: Predict if someone will buy ice cream

Given:

temperature (numeric)
is_holiday (0 or 1)

We’ll:

Train a simple linear model (like a 1-layer neural net) without feature engineering.
Train the same model with feature engineering: add interaction feature temperature × is_holiday, normalize inputs.

Python Program (Pure Python, Simple Logic)

# Sample dataset
data = [
    {'temp': 35, 'holiday': 0, 'buy': 1},
    {'temp': 30, 'holiday': 1, 'buy': 1},
    {'temp': 25, 'holiday': 0, 'buy': 0},
    {'temp': 40, 'holiday': 1, 'buy': 1},
    {'temp': 20, 'holiday': 0, 'buy': 0},
    {'temp': 38, 'holiday': 0, 'buy': 1},
    {'temp': 18, 'holiday': 1, 'buy': 0},
    {'temp': 22, 'holiday': 0, 'buy': 0},
    {'temp': 45, 'holiday': 1, 'buy': 1},
    {'temp': 27, 'holiday': 0, 'buy': 0}
]

# --------- Step 1: Raw data learning (simple weights) ---------
def train_simple_model(data):
    w_temp = 0.0
    w_holiday = 0.0
    bias = 0.0
    lr = 0.01  # learning rate

    for epoch in range(1000):
        for row in data:
            x1 = row['temp']
            x2 = row['holiday']
            y = row['buy']
            z = w_temp * x1 + w_holiday * x2 + bias
            pred = 1 if z > 30 else 0  # simple threshold
            error = y - pred

            # Update weights
            w_temp += lr * error * x1
            w_holiday += lr * error * x2
            bias += lr * error
    return w_temp, w_holiday, bias

def test_model(data, w_temp, w_holiday, bias):
    correct = 0
    for row in data:
        x1 = row['temp']
        x2 = row['holiday']
        y = row['buy']
        z = w_temp * x1 + w_holiday * x2 + bias
        pred = 1 if z > 30 else 0
        if pred == y:
            correct += 1
    accuracy = correct / len(data)
    return accuracy

# --------- Step 2: Add Feature Engineering ---------
def normalize(val, min_val, max_val):
    return (val - min_val) / (max_val - min_val)

def train_engineered_model(data):
    w_temp = 0.0
    w_holiday = 0.0
    w_interact = 0.0
    bias = 0.0
    lr = 0.01

    temps = [row['temp'] for row in data]
    min_temp, max_temp = min(temps), max(temps)

    for epoch in range(1000):
        for row in data:
            # Normalized features + interaction
            x1 = normalize(row['temp'], min_temp, max_temp)
            x2 = row['holiday']
            x3 = x1 * x2  # interaction
            y = row['buy']
            z = w_temp * x1 + w_holiday * x2 + w_interact * x3 + bias
            pred = 1 if z > 0.5 else 0
            error = y - pred

            # Update weights
            w_temp += lr * error * x1
            w_holiday += lr * error * x2
            w_interact += lr * error * x3
            bias += lr * error
    return w_temp, w_holiday, w_interact, bias, min_temp, max_temp

def test_engineered(data, w_temp, w_holiday, w_interact, bias, min_temp, max_temp):
    correct = 0
    for row in data:
        x1 = normalize(row['temp'], min_temp, max_temp)
        x2 = row['holiday']
        x3 = x1 * x2
        y = row['buy']
        z = w_temp * x1 + w_holiday * x2 + w_interact * x3 + bias
        pred = 1 if z > 0.5 else 0
        if pred == y:
            correct += 1
    return correct / len(data)

# --------- Run both models ---------
print("Training without feature engineering...")
w1, w2, b = train_simple_model(data)
acc_raw = test_model(data, w1, w2, b)
print("Accuracy (Raw):", acc_raw)

print("\nTraining with feature engineering...")
w1, w2, w3, b2, min_t, max_t = train_engineered_model(data)
acc_eng = test_engineered(data, w1, w2, w3, b2, min_t, max_t)
print("Accuracy (Engineered):", acc_eng)

Expected Output:

Training without feature engineering…
Accuracy (Raw): 0.6

Training with feature engineering…
Accuracy (Engineered): 1.0

Summary

The raw model struggles because it can’t learn complex patterns like “holiday + hot day → higher sales”.
The engineered model learns faster and more accurately because we gave it a meaningful combined feature and normalized data.

2. Why adding engineered features improves prediction?

Story Recap :

Imagine we’re trying to guess if people will buy ice cream. We have only:

temperature (e.g., 35°C)
holiday (yes or no → 1 or 0)

Now let’s observe two scenarios:

Scenario 1: Raw Inputs Only

Model sees:

temp = 30
holiday = 1

Model tries:

score = w_temp * 30 + w_holiday * 1 + bias

But the model is linear. It can’t easily learn: “Sales are high when both temperature is high AND it’s a holiday.”

This is a non-linear interaction, and raw linear models can’t combine two inputs multiplicatively unless we help them.

Scenario 2: Engineered Feature Added → Interaction

We created:

x3 = x1 * x2 → normalized_temp × holiday

This feature only activates when both conditions are true.

So for:

Hot day (temp = 40 → normalized = ~0.8)
Holiday (1)

Interaction feature becomes:
x3 = 0.8 × 1 = 0.8

Whereas:

Cold day + holiday → 0.2 × 1 = 0.2
Hot day + no holiday → 0.8 × 0 = 0

New model:
score = w_temp * temp + w_holiday * holiday + w_interact * (temp × holiday) + bias

Now the model can assign special importance to combinations like:”High temperature AND holiday = people buy ice cream”
That logic couldn’t be learned by a simple sum of weights.

Simple Analogy :

Without interaction: You’re saying “Hot days” are good, “Holidays” are good — independently.

With interaction: You’re saying “Hot Holidays” are especially good!

Technically Speaking:

A simple model without interactions:

z = w1 * temp + w2 * holiday + bias

is a linear separator, i.e., it draws a straight line.

Adding:

z = w1 * temp + w2 * holiday + w3 * (temp × holiday) + bias

lets it fit curved or conditional boundaries, i.e., it can:

Adjust slope depending on combinations.
Learn contextual influence (e.g., holiday matters only if hot).

This is feature engineering — making the model’s job easier by expressing logic explicitly in features.

In our code

Before:

Model has no way to “learn” temperature × holiday effect.
It gives average accuracy (like 60%).

After:

Model sees that when both temp is high and holiday is 1 → customers buy.
Model fits faster and with more precision → 100% accuracy.

Conclusion

Adding the temp × holiday interaction tells the model: “Hey, don’t treat temp and holiday separately. Sometimes they work together to drive behavior.”

This unlocks learning of deeper patterns.

Next – Encoding in Neural Networks