Ridge Regression example with Simple Python

1. A non-library (pure Python + NumPy) version of Ridge Regression and show how changing λ (lambda) affects predictions.

What We’ll Do:

  • Generate noisy linear data
  • Implement Ridge Regression from scratch
  • Solve for weights using the closed-form equation:

Full Python Code (No ML Libraries Used):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
import matplotlib.pyplot as plt
 
# Step 1: Generate synthetic data
np.random.seed(42)
X = np.linspace(0, 10, 50).reshape(-1, 1)
y = 3 * X.flatten() + 5 + np.random.randn(50) * 5  # true line + noise
 
# Add bias term (column of 1s) to X
X_b = np.hstack([np.ones((X.shape[0], 1)), X])  # shape: (50, 2)
 
# Step 2: Ridge Regression (manual implementation)
def ridge_regression(X, y, lambda_val):
    n_features = X.shape[1]
    I = np.eye(n_features)
    I[0, 0] = 0  # Don't regularize the bias term
    w = np.linalg.inv(X.T @ X + lambda_val * I) @ X.T @ y
    return w
 
# Step 3: Try different λ values
lambdas = [0, 1, 10, 100]
colors = ['blue', 'green', 'orange', 'red']
 
# Plot original data
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='black', label='Data')
 
# Step 4: Train and plot for each λ
for i, lam in enumerate(lambdas):
    w = ridge_regression(X_b, y, lam)
    y_pred = X_b @ w
    plt.plot(X, y_pred, color=colors[i], label=f"λ = {lam:.1f}")
 
plt.title("Effect of λ (lambda) on Ridge Regression (No Libraries)")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.grid(True)
plt.show()

What we’ll Observe:

  • λ = 0 → Line fits noise (overfit)
  • λ = 1, 10 → Smoother, more generalizable line
  • λ = 100 → Line becomes almost flat (underfit)

Key Notes:

  • This uses the normal equation for Ridge Regression.
  • I[0, 0] = 0 ensures we don’t penalize the bias term, which is standard.

2. The Story: Predicting House Prices — The Realtor’s Dilemma

Characters:

  • Reena, a real estate agent
  • The House Data: Size, Age, Distance from school, Crime Rate, etc.
  • The Goal: Reena wants to predict house prices based on past data.

3. Step-by-Step Explanation of Ridge Regression (with Descriptive Visualization)

Step 1: Reena collects data

She gathers:

  • 50 houses
  • For each: square footage, age, crime rate, school distance, and actual price

She puts them into a table:

Size Age Crime Rate Distance from School Price
1400 5 3 1.2 km ₹50L
1600 8 4 2.0 km ₹55L

Why?
She wants to find patterns. Bigger houses usually cost more — but some data points don’t follow the trend due to location or crime rate.

Step 2: She tries to fit a line to predict price

She uses basic Linear Regression. It tries to find the best equation:

Price = w1×Size+w2×Age+…+Bias

Why? She believes a formula will help her estimate prices for new houses in future.

Problem: Overfitting

Reena adds 20+ features like:

  • Nearby restaurants, traffic data, number of windows, color of walls

Now, her formula becomes super-complicated:

Price = 0.82×Size+1.4×Age−12.8×Crime+…

Why is this bad?

  • It fits the past data perfectly (even noise).
  • But fails terribly when used on new houses — overfit!

Step 3: Reena adds a Rule — Don’t trust crazy numbers!

She says:“I want to make accurate predictions, but I don’t want my formula to go crazy with huge numbers just to fit the past.”

So, she adds a penalty for large coefficients — like saying:

“Hey model, if you’re going to use ‘crime rate’ as a feature, fine — but don’t make it dominate everything unless it’s really important!” This is Ridge Regression.

Step 4: Reena balances two things

1. Fitting the data → Get predictions close to actual house prices (like linear regression)

2. Keeping the formula simple → Don’t allow very large weights (penalize big numbers)

This new goal becomes:

Final Score=Error+λ×(Penalty for large numbers)

Why? She wants to find a middle ground — accurate yet stable predictions.

Step 5: What does lambda (λ) mean in Reena’s world?

  • λ = 0 → “Fit the data as tightly as you want” (may overfit)
  • λ = 10 → “Be careful — don’t stretch too much to fit outliers”
  • λ = 100 → “Stick to the middle — don’t trust noisy data”

Why is this useful?

Reena now builds a model that works well for new houses — it’s not too clever, not too dumb — just right.

4. Summary of Learning Ridge Regression

Concept Real Life Analogy Why It’s Used
Linear Fit Predict price based on past sales To generalize patterns
Overfitting Memorizing past houses too closely Fails on new data
λ penalty Don’t trust extreme behaviors Keeps the model grounded
Ridge Formula Balances accuracy + simplicity Gives better predictions on new data

Ridge Regression – Basic Math Concepts