Ridge Regression example with Simple Python
1. A non-library (pure Python + NumPy) version of Ridge Regression and show how changing λ (lambda) affects predictions.
What We’ll Do:
- Generate noisy linear data
- Implement Ridge Regression from scratch
- Solve for weights using the closed-form equation:
Full Python Code (No ML Libraries Used):
import numpy as np import matplotlib.pyplot as plt # Step 1: Generate synthetic data np.random.seed(42) X = np.linspace(0, 10, 50).reshape(-1, 1) y = 3 * X.flatten() + 5 + np.random.randn(50) * 5 # true line + noise # Add bias term (column of 1s) to X X_b = np.hstack([np.ones((X.shape[0], 1)), X]) # shape: (50, 2) # Step 2: Ridge Regression (manual implementation) def ridge_regression(X, y, lambda_val): n_features = X.shape[1] I = np.eye(n_features) I[0, 0] = 0 # Don't regularize the bias term w = np.linalg.inv(X.T @ X + lambda_val * I) @ X.T @ y return w # Step 3: Try different λ values lambdas = [0, 1, 10, 100] colors = ['blue', 'green', 'orange', 'red'] # Plot original data plt.figure(figsize=(10, 6)) plt.scatter(X, y, color='black', label='Data') # Step 4: Train and plot for each λ for i, lam in enumerate(lambdas): w = ridge_regression(X_b, y, lam) y_pred = X_b @ w plt.plot(X, y_pred, color=colors[i], label=f"λ = {lam:.1f}") plt.title("Effect of λ (lambda) on Ridge Regression (No Libraries)") plt.xlabel("X") plt.ylabel("y") plt.legend() plt.grid(True) plt.show()
What we’ll Observe:
- λ = 0 → Line fits noise (overfit)
- λ = 1, 10 → Smoother, more generalizable line
- λ = 100 → Line becomes almost flat (underfit)
Key Notes:
- This uses the normal equation for Ridge Regression.
- I[0, 0] = 0 ensures we don’t penalize the bias term, which is standard.
2. The Story: Predicting House Prices — The Realtor’s Dilemma
Characters:
- Reena, a real estate agent
- The House Data: Size, Age, Distance from school, Crime Rate, etc.
- The Goal: Reena wants to predict house prices based on past data.
3. Step-by-Step Explanation of Ridge Regression (with Descriptive Visualization)
Step 1: Reena collects data
She gathers:
- 50 houses
- For each: square footage, age, crime rate, school distance, and actual price
She puts them into a table:
Size | Age | Crime Rate | Distance from School | Price |
---|---|---|---|---|
1400 | 5 | 3 | 1.2 km | ₹50L |
1600 | 8 | 4 | 2.0 km | ₹55L |
… | … | … | … | … |
Why?
She wants to find patterns. Bigger houses usually cost more — but some data points don’t follow the trend due to location or crime rate.
Step 2: She tries to fit a line to predict price
She uses basic Linear Regression. It tries to find the best equation:
Price = w1×Size+w2×Age+…+Bias
Why? She believes a formula will help her estimate prices for new houses in future.
Problem: Overfitting
Reena adds 20+ features like:
- Nearby restaurants, traffic data, number of windows, color of walls
Now, her formula becomes super-complicated:
Price = 0.82×Size+1.4×Age−12.8×Crime+…
Why is this bad?
- It fits the past data perfectly (even noise).
- But fails terribly when used on new houses — overfit!
Step 3: Reena adds a Rule — Don’t trust crazy numbers!
She says:“I want to make accurate predictions, but I don’t want my formula to go crazy with huge numbers just to fit the past.”
So, she adds a penalty for large coefficients — like saying:
“Hey model, if you’re going to use ‘crime rate’ as a feature, fine — but don’t make it dominate everything unless it’s really important!” This is Ridge Regression.
Step 4: Reena balances two things
1. Fitting the data → Get predictions close to actual house prices (like linear regression)
2. Keeping the formula simple → Don’t allow very large weights (penalize big numbers)
This new goal becomes:
Final Score=Error+λ×(Penalty for large numbers)
Why? She wants to find a middle ground — accurate yet stable predictions.
Step 5: What does lambda (λ) mean in Reena’s world?
- λ = 0 → “Fit the data as tightly as you want” (may overfit)
- λ = 10 → “Be careful — don’t stretch too much to fit outliers”
- λ = 100 → “Stick to the middle — don’t trust noisy data”
Why is this useful?
Reena now builds a model that works well for new houses — it’s not too clever, not too dumb — just right.
4. Summary of Learning Ridge Regression
Concept | Real Life Analogy | Why It’s Used |
---|---|---|
Linear Fit | Predict price based on past sales | To generalize patterns |
Overfitting | Memorizing past houses too closely | Fails on new data |
λ penalty | Don’t trust extreme behaviors | Keeps the model grounded |
Ridge Formula | Balances accuracy + simplicity | Gives better predictions on new data |
Ridge Regression – Basic Math Concepts