Lasso Regression example with Simple Python

1. Goal:

Alex wants to predict how much a house will sell for, based on things like:

Size
Number of rooms
Distance from the city
Garden size
Fancy kitchen features
… and 50 more things

But Alex doesn’t know which ones actually matter.

Part 1: The Backpack Analogy

Alex is going on a hike. He has 50 items to pack, but can only carry 10.
Each item adds weight, and only some of them are actually useful

He thinks:”Let me try packing all, and slowly remove what doesn’t help me survive.”

This is Lasso Regression — we try all features (variables), but penalize the unhelpful ones and eventually drop them.

Part 2: How Lasso Works

Lasso helps Alex:

Start with all features (carry everything)
Check each one’s usefulness in predicting house price
Punish useless features by adding a cost (a “penalty”)
If the feature is not helpful, its weight (importance) shrinks toward zero
If it becomes zero, it’s like throwing the item out of the bag

So, Lasso = Linear Prediction + Penalty for Carrying Extra Items

2. Math (in Simple Words)

Let’s say

Alex predicts house price using:

price = w₁ ⋅ size + w₂ ⋅ rooms + w₃ ⋅ garden + b

Where w_1, w_2, and w_3 are weights showing how important each factor is.Without regularization, Alex just finds the weights to best match the data.But with Lasso, we say:

“Don’t just match the data. Make sure you’re not carrying useless things.”

So we add a penalty to the formula:

Loss = Error + λ (|w₁| + |w₂| + |w₃| + …)

Error → How wrong the prediction is
λ (lambda) → How strict Alex is about dropping useless items
|w| → Absolute value of the weight (penalty added for just using it)

3. Why Each Step Matters

Step	What Happens	Why It’s Done
Start with all features	Alex uses every detail to guess house prices	Assumes all information might help
Measure prediction error	Compares predicted vs actual prices	To learn from mistakes
Add penalty for using too many features	Charges a fee for every feature	Forces Alex to be efficient
Shrink small weights	If a feature is not useful, its weight goes down	Eventually gets dropped
Some weights become zero	Useless features are completely ignored	Only key items are kept in the backpack

4. Real-Life Use Case: House Price Prediction

Initial model: Uses 50+ features

After Lasso:

Keeps only the useful ones like:

Square footage
Location rating
Year built

Drops noisy ones like:

Number of plants in garden
Distance to donut shop

Why? Because those dropped features don’t consistently help in making predictions across different houses.

5. Second Use Case: Disease Risk Detection

Imagine predicting if someone will get diabetes using 100 health indicators.

Lasso finds:

Glucose level, BMI, age → very predictive
Eye color, number of siblings → not predictive

Lasso keeps the important ones and discards the noise.

6. Why some features are useful and others are not (with a real-world example)

Real-Life Scenario: Predicting House Price

Alex is trying to predict the price of a house. Suppose he has 3 features (variables):

Area (sq.ft) → likely to impact price
Number of Rooms → also likely to matter
Number of Plants in the Garden → probably not useful

And he has some data:

Area (x₁)	Rooms (x₂)	Plants (x₃)	Price (y)
1000	3	20	500,000
1500	4	25	700,000
1200	3	30	550,000
1800	5	18	800,000

7. Step-by-Step: Without Regularization (Normal Linear Regression)

We try to fit a line (or plane) like:

Predicted Price = w₁ ⋅ Area + w₂ ⋅ Rooms + w₃ ⋅ Plants + b

We find weights w₁, w₂, w₃ to minimize error (difference between predicted and actual price). This is done using Mean Squared Error (MSE):

MSE = (1/n) Σ (Predicted − Actual)²

So far, every variable tries to reduce the error — even if it’s only doing a little.

Problem?
Even a useless feature like “plants” might slightly reduce the error. But that doesn’t mean it’s actually important

Step-by-Step: Now Add Lasso Regularization

Lasso modifies the loss function to:

Loss = MSE + λ (|w₁| + |w₂| + |w₃|)

Now it’s not just about minimizing error — we’re penalizing each feature.

Now Let’s Compare Features

Let’s say after training:

w1=300w_1 = 300w1=300 → Area (impact is strong)

w2=15,000w_2 = 15,000w2=15,000 → Rooms (also strong)

w3=5w_3 = 5w3=5 → Plants (very small impact)

Insight:

Area and Rooms are needed to reduce error substantially.
Plants only reduced the error slightly — but it adds to the penalty.

Now, the optimizer (Lasso) thinks:

“The plants feature isn’t helping enough to justify the penalty. Let me set w₃ = 0 and drop it.”

8. Visual Interpretation:

Feature	Contribution to Prediction	Lasso Penalty	Worth Keeping?
Area	High	Medium	Yes
Rooms	High	Medium	Yes
Plants in Garden	Low	Still adds	No

So, Lasso forces a trade-off:
“Only keep a feature if it helps a lot — enough to outweigh the cost.”

Real-Life Explanation:

Imagine we’re paying rent for each feature you use.

Area and Rooms give big returns → pay the rent.
Plants give very little → not worth keeping.

9.Final Prediction Model:

After Lasso, the model becomes:

Price=300⋅Area+15,000⋅Rooms+0⋅Plants+b

Plants are completely eliminated — and this simplifies the model.

10. Predict house prices using:

Area (sq.ft)
Rooms
Plants in garden (intentionally noisy)

Dataset

# [Area, Rooms, Plants] → Features
X = [
    [1000, 3, 20],
    [1500, 4, 25],
    [1200, 3, 30],
    [1800, 5, 18]
]

# Corresponding house prices (in $1,000s)
y = [500, 700, 550, 800]

Step-by-step Lasso Logic in Python (from scratch)

# Initialize weights and bias
w = [0.0, 0.0, 0.0]  # w1: area, w2: rooms, w3: plants
b = 0.0

alpha = 0.000001  # learning rate (small for precision)
lambda_ = 0.1     # L1 penalty strength
epochs = 1000

n = len(X)

for epoch in range(epochs):
    dw = [0.0, 0.0, 0.0]
    db = 0.0

    # Compute gradients
    for i in range(n):
        x1, x2, x3 = X[i]
        y_pred = w[0]*x1 + w[1]*x2 + w[2]*x3 + b
  error = y_pred - y[i]

        dw[0] += error * x1
        dw[1] += error * x2
        dw[2] += error * x3
        db += error

    # Average gradients
    dw = [d / n for d in dw]
    db /= n

    # Add L1 penalty to gradients
    for j in range(3):
        if w[j] > 0:
            dw[j] += lambda_
        elif w[j] < 0:
            dw[j] -= lambda_
        # if w[j] == 0 → no penalty change

    # Update weights and bias
    for j in range(3):
        w[j] -= alpha * dw[j]
    b -= alpha * db

    if epoch % 100 == 0:
        print(f"Epoch {epoch}: Weights = {w}, Bias = {b:.2f}")

print("\n Final Model:")
print(f"Price = {w[0]:.2f} * Area + {w[1]:.2f} * Rooms + {w[2]:.2f} * Plants + {b:.2f}")

What We’ll Observe

w[0] (Area): will become large — big impact
w[1] (Rooms): will also grow
w[2] (Plants): will stay small or may become very close to zero

Output Sample (Varies slightly run to run)

Epoch 0: Weights = [0.28, 0.01, 0.005], Bias = 0.20
…
Epoch 900: Weights = [0.31, 0.012, 0.0001], Bias = 1.90

Final Model:
Price = 0.31 * Area + 0.012 * Rooms + 0.0001 * Plants + 1.90

→ Notice how plants’ weight is almost 0. That’s Lasso kicking in, realizing it doesn’t add value.

Realization

The model learns that Area and Rooms contribute most to reducing error.
Plants barely help, but they add a cost (λ × |w|), so it’s better to drop them (shrink to 0).

Lasso Regression – Basic Math Concepts