Ridge Regression example with Simple Python
1. A non-library (pure Python + NumPy) version of Ridge Regression and show how changing λ (lambda) affects predictions.
What We’ll Do:
- Generate noisy linear data
- Implement Ridge Regression from scratch
- Solve for weights using the closed-form equation:
Full Python Code (No ML Libraries Used):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | import numpy as np import matplotlib.pyplot as plt # Step 1: Generate synthetic data np.random.seed( 42 ) X = np.linspace( 0 , 10 , 50 ).reshape( - 1 , 1 ) y = 3 * X.flatten() + 5 + np.random.randn( 50 ) * 5 # true line + noise # Add bias term (column of 1s) to X X_b = np.hstack([np.ones((X.shape[ 0 ], 1 )), X]) # shape: (50, 2) # Step 2: Ridge Regression (manual implementation) def ridge_regression(X, y, lambda_val): n_features = X.shape[ 1 ] I = np.eye(n_features) I[ 0 , 0 ] = 0 # Don't regularize the bias term w = np.linalg.inv(X.T @ X + lambda_val * I) @ X.T @ y return w # Step 3: Try different λ values lambdas = [ 0 , 1 , 10 , 100 ] colors = [ 'blue' , 'green' , 'orange' , 'red' ] # Plot original data plt.figure(figsize = ( 10 , 6 )) plt.scatter(X, y, color = 'black' , label = 'Data' ) # Step 4: Train and plot for each λ for i, lam in enumerate (lambdas): w = ridge_regression(X_b, y, lam) y_pred = X_b @ w plt.plot(X, y_pred, color = colors[i], label = f "λ = {lam:.1f}" ) plt.title( "Effect of λ (lambda) on Ridge Regression (No Libraries)" ) plt.xlabel( "X" ) plt.ylabel( "y" ) plt.legend() plt.grid( True ) plt.show() |
What we’ll Observe:
- λ = 0 → Line fits noise (overfit)
- λ = 1, 10 → Smoother, more generalizable line
- λ = 100 → Line becomes almost flat (underfit)
Key Notes:
- This uses the normal equation for Ridge Regression.
- I[0, 0] = 0 ensures we don’t penalize the bias term, which is standard.
2. The Story: Predicting House Prices — The Realtor’s Dilemma
Characters:
- Reena, a real estate agent
- The House Data: Size, Age, Distance from school, Crime Rate, etc.
- The Goal: Reena wants to predict house prices based on past data.
3. Step-by-Step Explanation of Ridge Regression (with Descriptive Visualization)
Step 1: Reena collects data
She gathers:
- 50 houses
- For each: square footage, age, crime rate, school distance, and actual price
She puts them into a table:
Size | Age | Crime Rate | Distance from School | Price |
---|---|---|---|---|
1400 | 5 | 3 | 1.2 km | ₹50L |
1600 | 8 | 4 | 2.0 km | ₹55L |
… | … | … | … | … |
Why?
She wants to find patterns. Bigger houses usually cost more — but some data points don’t follow the trend due to location or crime rate.
Step 2: She tries to fit a line to predict price
She uses basic Linear Regression. It tries to find the best equation:
Price = w1×Size+w2×Age+…+Bias
Why? She believes a formula will help her estimate prices for new houses in future.
Problem: Overfitting
Reena adds 20+ features like:
- Nearby restaurants, traffic data, number of windows, color of walls
Now, her formula becomes super-complicated:
Price = 0.82×Size+1.4×Age−12.8×Crime+…
Why is this bad?
- It fits the past data perfectly (even noise).
- But fails terribly when used on new houses — overfit!
Step 3: Reena adds a Rule — Don’t trust crazy numbers!
She says:“I want to make accurate predictions, but I don’t want my formula to go crazy with huge numbers just to fit the past.”
So, she adds a penalty for large coefficients — like saying:
“Hey model, if you’re going to use ‘crime rate’ as a feature, fine — but don’t make it dominate everything unless it’s really important!” This is Ridge Regression.
Step 4: Reena balances two things
1. Fitting the data → Get predictions close to actual house prices (like linear regression)
2. Keeping the formula simple → Don’t allow very large weights (penalize big numbers)
This new goal becomes:
Final Score=Error+λ×(Penalty for large numbers)
Why? She wants to find a middle ground — accurate yet stable predictions.
Step 5: What does lambda (λ) mean in Reena’s world?
- λ = 0 → “Fit the data as tightly as you want” (may overfit)
- λ = 10 → “Be careful — don’t stretch too much to fit outliers”
- λ = 100 → “Stick to the middle — don’t trust noisy data”
Why is this useful?
Reena now builds a model that works well for new houses — it’s not too clever, not too dumb — just right.
4. Summary of Learning Ridge Regression
Concept | Real Life Analogy | Why It’s Used |
---|---|---|
Linear Fit | Predict price based on past sales | To generalize patterns |
Overfitting | Memorizing past houses too closely | Fails on new data |
λ penalty | Don’t trust extreme behaviors | Keeps the model grounded |
Ridge Formula | Balances accuracy + simplicity | Gives better predictions on new data |
Ridge Regression – Basic Math Concepts