Lasso Regression example with Simple Python
1. Goal:
Alex wants to predict how much a house will sell for, based on things like:
- Size
- Number of rooms
- Distance from the city
- Garden size
- Fancy kitchen features
- … and 50 more things
But Alex doesn’t know which ones actually matter.
Part 1: The Backpack Analogy
Alex is going on a hike. He has 50 items to pack, but can only carry 10.
Each item adds weight, and only some of them are actually useful
He thinks:”Let me try packing all, and slowly remove what doesn’t help me survive.”
This is Lasso Regression — we try all features (variables), but penalize the unhelpful ones and eventually drop them.
Part 2: How Lasso Works
Lasso helps Alex:
- Start with all features (carry everything)
- Check each one’s usefulness in predicting house price
- Punish useless features by adding a cost (a “penalty”)
- If the feature is not helpful, its weight (importance) shrinks toward zero
- If it becomes zero, it’s like throwing the item out of the bag
So, Lasso = Linear Prediction + Penalty for Carrying Extra Items
2. Math (in Simple Words)
Let’s say
- Alex predicts house price using:
price = w₁ ⋅ size + w₂ ⋅ rooms + w₃ ⋅ garden + b
Where w_1, w_2, and w_3 are weights showing how important each factor is.Without regularization, Alex just finds the weights to best match the data.But with Lasso, we say:
“Don’t just match the data. Make sure you’re not carrying useless things.”
So we add a penalty to the formula:
Loss = Error + λ (|w₁| + |w₂| + |w₃| + …)
- Error → How wrong the prediction is
- λ (lambda) → How strict Alex is about dropping useless items
- |w| → Absolute value of the weight (penalty added for just using it)
3. Why Each Step Matters
Step | What Happens | Why It’s Done |
---|---|---|
Start with all features | Alex uses every detail to guess house prices | Assumes all information might help |
Measure prediction error | Compares predicted vs actual prices | To learn from mistakes |
Add penalty for using too many features | Charges a fee for every feature | Forces Alex to be efficient |
Shrink small weights | If a feature is not useful, its weight goes down | Eventually gets dropped |
Some weights become zero | Useless features are completely ignored | Only key items are kept in the backpack |
4. Real-Life Use Case: House Price Prediction
Initial model: Uses 50+ features
After Lasso:
Keeps only the useful ones like:
- Square footage
- Location rating
- Year built
Drops noisy ones like:
- Number of plants in garden
- Distance to donut shop
Why? Because those dropped features don’t consistently help in making predictions across different houses.
5. Second Use Case: Disease Risk Detection
Imagine predicting if someone will get diabetes using 100 health indicators.
Lasso finds:
- Glucose level, BMI, age → very predictive
- Eye color, number of siblings → not predictive
Lasso keeps the important ones and discards the noise.
6. Why some features are useful and others are not (with a real-world example)
Real-Life Scenario: Predicting House Price
Alex is trying to predict the price of a house. Suppose he has 3 features (variables):
- Area (sq.ft) → likely to impact price
- Number of Rooms → also likely to matter
- Number of Plants in the Garden → probably not useful
And he has some data:
Area (x₁) | Rooms (x₂) | Plants (x₃) | Price (y) |
---|---|---|---|
1000 | 3 | 20 | 500,000 |
1500 | 4 | 25 | 700,000 |
1200 | 3 | 30 | 550,000 |
1800 | 5 | 18 | 800,000 |
7. Step-by-Step: Without Regularization (Normal Linear Regression)
We try to fit a line (or plane) like:
Predicted Price = w₁ ⋅ Area + w₂ ⋅ Rooms + w₃ ⋅ Plants + b
We find weights w₁, w₂, w₃ to minimize error (difference between predicted and actual price). This is done using Mean Squared Error (MSE):
MSE = (1/n) Σ (Predicted − Actual)²
So far, every variable tries to reduce the error — even if it’s only doing a little.
Problem?
Even a useless feature like “plants” might slightly reduce the error. But that doesn’t mean it’s actually important
Step-by-Step: Now Add Lasso Regularization
Lasso modifies the loss function to:
Loss = MSE + λ (|w₁| + |w₂| + |w₃|)
Now it’s not just about minimizing error — we’re penalizing each feature.
Now Let’s Compare Features
Let’s say after training:
- w1=300w_1 = 300w1=300 → Area (impact is strong)
- w2=15,000w_2 = 15,000w2=15,000 → Rooms (also strong)
- w3=5w_3 = 5w3=5 → Plants (very small impact)
Insight:
- Area and Rooms are needed to reduce error substantially.
- Plants only reduced the error slightly — but it adds to the penalty.
Now, the optimizer (Lasso) thinks:
“The plants feature isn’t helping enough to justify the penalty. Let me set w₃ = 0 and drop it.”
8. Visual Interpretation:
Feature | Contribution to Prediction | Lasso Penalty | Worth Keeping? |
---|---|---|---|
Area | High | Medium | Yes |
Rooms | High | Medium | Yes |
Plants in Garden | Low | Still adds | No |
So, Lasso forces a trade-off:
“Only keep a feature if it helps a lot — enough to outweigh the cost.”
Real-Life Explanation:
Imagine we’re paying rent for each feature you use.
- Area and Rooms give big returns → pay the rent.
- Plants give very little → not worth keeping.
9.Final Prediction Model:
After Lasso, the model becomes:
Price=300⋅Area+15,000⋅Rooms+0⋅Plants+b
Plants are completely eliminated — and this simplifies the model.
10. Predict house prices using:
- Area (sq.ft)
- Rooms
- Plants in garden (intentionally noisy)
Dataset
# [Area, Rooms, Plants] → Features X = [ [1000, 3, 20], [1500, 4, 25], [1200, 3, 30], [1800, 5, 18] ] # Corresponding house prices (in $1,000s) y = [500, 700, 550, 800]
Step-by-step Lasso Logic in Python (from scratch)
# Initialize weights and bias w = [0.0, 0.0, 0.0] # w1: area, w2: rooms, w3: plants b = 0.0 alpha = 0.000001 # learning rate (small for precision) lambda_ = 0.1 # L1 penalty strength epochs = 1000 n = len(X) for epoch in range(epochs): dw = [0.0, 0.0, 0.0] db = 0.0 # Compute gradients for i in range(n): x1, x2, x3 = X[i] y_pred = w[0]*x1 + w[1]*x2 + w[2]*x3 + b error = y_pred - y[i] dw[0] += error * x1 dw[1] += error * x2 dw[2] += error * x3 db += error # Average gradients dw = [d / n for d in dw] db /= n # Add L1 penalty to gradients for j in range(3): if w[j] > 0: dw[j] += lambda_ elif w[j] < 0: dw[j] -= lambda_ # if w[j] == 0 → no penalty change # Update weights and bias for j in range(3): w[j] -= alpha * dw[j] b -= alpha * db if epoch % 100 == 0: print(f"Epoch {epoch}: Weights = {w}, Bias = {b:.2f}") print("\n Final Model:") print(f"Price = {w[0]:.2f} * Area + {w[1]:.2f} * Rooms + {w[2]:.2f} * Plants + {b:.2f}")
What We’ll Observe
- w[0] (Area): will become large — big impact
- w[1] (Rooms): will also grow
- w[2] (Plants): will stay small or may become very close to zero
Output Sample (Varies slightly run to run)
Epoch 0: Weights = [0.28, 0.01, 0.005], Bias = 0.20
…
Epoch 900: Weights = [0.31, 0.012, 0.0001], Bias = 1.90Final Model:
Price = 0.31 * Area + 0.012 * Rooms + 0.0001 * Plants + 1.90
→ Notice how plants’ weight is almost 0. That’s Lasso kicking in, realizing it doesn’t add value.
Realization
- The model learns that Area and Rooms contribute most to reducing error.
- Plants barely help, but they add a cost (λ × |w|), so it’s better to drop them (shrink to 0).
Lasso Regression – Basic Math Concepts