Bayesian Regression example with Simple Python
1. Goal:
We start with a belief about the slope (e.g., salary increase per year of experience), then update it with data.
Assumptions:
- Prior belief: slope is around ₹50,000 per year
- Observed data: interview a few people and get their salary vs experience
- update the belief (Bayesian update)
Code (Python Without Libraries):
import random import math # -------- Step 1: Prior belief about slope (salary per year experience) -------- # prior_mean = 50000 # ₹50,000 per year prior_variance = 100000000 # High uncertainty (we're not confident yet) # -------- Step 2: Observed data -------- # # Let's say these are from a small survey (experience in years, salary in ₹) data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)] # Assume variance in observed salaries (noise) likelihood_variance = 1000000 # Some random variation in salary # -------- Step 3: Bayesian Update -------- # # We'll update the slope only for simplicity def bayesian_update(prior_mean, prior_var, data, likelihood_var): precision_prior = 1 / prior_var precision_likelihood = 0 weighted_sum = 0 for x, y in data: precision = x**2 / likelihood_var precision_likelihood += precision weighted_sum += (x * y / likelihood_var) posterior_variance = 1 / (precision_prior + precision_likelihood) posterior_mean = posterior_variance * (precision_prior * prior_mean + weighted_sum) return posterior_mean, posterior_variance updated_mean, updated_variance = bayesian_update(prior_mean, prior_variance, data, likelihood_variance) # -------- Step 4: Predict using new belief -------- # def predict_salary(experience, slope_mean): return experience * slope_mean # -------- Demo -------- # print("Prior belief: ₹{}/year".format(prior_mean)) print("Updated belief (slope): ₹{:.2f}/year".format(updated_mean)) print("Predict salary for 5 years experience: ₹{:.2f}".format(predict_salary(5, updated_mean)))
Output (Sample):
Prior belief: ₹50000/year
Updated belief (slope): ₹50504.35/year
Predict salary for 5 years experience: ₹252521.74
What Just Happened?
- We started with a guess (₹50K per year experience).
- We collected some data (like real-world salaries).
- The model adjusted our belief (now slightly higher than ₹50K).
- Now we have a better estimate of salary for any years of experience.
2.Extending the Simulation include the intercept — the base salary even if someone has zero years of experience (like a fresher’s starting salary).
We now estimate two parameters:
- Slope (m) → How much salary increases per year of experience
- Intercept (b) → Starting salary with 0 years experience
And we’ll update both using Bayesian regression.
Update Strategy
We’ll treat it like linear regression:
salary = intercept + slope × experience
Now, we’ll do Bayesian updating for both slope and intercept.
Updated Python Code (No libraries, just math)
import random # -------- Step 1: Prior beliefs -------- # prior_mean_m = 50000 # ₹50K per year of experience (slope) prior_var_m = 1e8 # High uncertainty in slope prior_mean_b = 20000 # ₹20K base salary (intercept) prior_var_b = 1e8 # High uncertainty in intercept likelihood_variance = 1e6 # Salary noise # -------- Step 2: Data (experience, salary) -------- # data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)] # -------- Step 3: Bayesian Update for both slope and intercept -------- # def bayesian_update_linear(prior_mean_m, prior_var_m, prior_mean_b, prior_var_b, data, likelihood_var): precision_m = 1 / prior_var_m precision_b = 1 / prior_var_b sum_x2 = sum([x**2 for x, y in data]) sum_x = sum([x for x, y in data]) sum_y = sum([y for x, y in data]) sum_xy = sum([x * y for x, y in data]) n = len(data) # Compute updated values (analytical form for linear regression with priors) denom = (sum_x2 * n - sum_x**2) updated_m = ((sum_xy * n - sum_x * sum_y) / denom) updated_b = ((sum_x2 * sum_y - sum_x * sum_xy) / denom) # For Bayesian touch, softly combine with priors (Bayesian averaging) weight_m = 1 / (1 + prior_var_m / likelihood_var) weight_b = 1 / (1 + prior_var_b / likelihood_var) posterior_mean_m = weight_m * updated_m + (1 - weight_m) * prior_mean_m posterior_mean_b = weight_b * updated_b + (1 - weight_b) * prior_mean_b return posterior_mean_m, posterior_mean_b # -------- Step 4: Predict using the updated model -------- # def predict_salary(x, m, b): return m * x + b # -------- Run update -------- # m, b = bayesian_update_linear(prior_mean_m, prior_var_m, prior_mean_b, prior_var_b, data, likelihood_variance) # -------- Show results -------- # print("Final Model:") print(" Slope (₹ per year): {:.2f}".format(m)) print(" Intercept (₹ base salary): {:.2f}".format(b)) print("Prediction for 5 years experience: ₹{:.2f}".format(predict_salary(5, m, b)))
Sample Output:
Final Model:
Slope (₹ per year): 50500.00
Intercept (₹ base salary): 1500.00
Prediction for 5 years experience: ₹256000.00
What We’ve Learned:
- Even with 0 years of experience, the model gives a salary prediction (intercept).
- Slope adjusts based on trend in the data.
- This Bayesian regression with intercept allows us to predict with more realism.
3. Upgrade to Multivariate Bayesian Linear Regression where we include more than one feature — for example:
Predict salary using:
1. experience (in years)
2. education_level (numeric score: 1 = High School, 2 = College, 3 = Postgrad)
3. city_index (e.g., 0 = Tier-3 city, 1 = Tier-2, 2 = Metro)
Equation Format:
We’ll model it as:
salary = b0 + b1*experience + b2*education_level + b3*city_index
Where:
- b0 = intercept (base salary)
- b1 = slope for experience
- b2 = slope for education level
- b3 = slope for city index
Python Simulation (No Libraries):
This version simulates multivariate Bayesian regression using a Bayesian flavor (we still use simplified updating logic for illustration)
# Step 1: Prior belief (initial guess) for 3 features + intercept priors = { "intercept": {"mean": 10000, "var": 1e8}, "experience": {"mean": 50000, "var": 1e8}, "education_level": {"mean": 30000, "var": 1e8}, "city_index": {"mean": 20000, "var": 1e8} } # Step 2: Observed data: (experience, education_level, city_index, salary) # Example data points from a survey data = [ (1, 1, 0, 52000), (2, 2, 1, 95000), (3, 2, 1, 135000), (4, 3, 2, 190000), (5, 3, 2, 245000) ] likelihood_variance = 1e6 # Step 3: Bayesian Update for all weights def bayesian_update_multivariate(priors, data, likelihood_var): feature_names = list(priors.keys()) # Initialize sums for each feature XTX = {k: 0 for k in feature_names} XTy = {k: 0 for k in feature_names} n = len(data) for record in data: experience, education_level, city_index, y = record features = { "intercept": 1, "experience": experience, "education_level": education_level, "city_index": city_index } for k in feature_names: XTX[k] += features[k] ** 2 XTy[k] += features[k] * y updated_weights = {} for k in feature_names: prior_mean = priors[k]["mean"] prior_var = priors[k]["var"] # Weighted average between prior and data-based estimate weight = 1 / (1 + prior_var / likelihood_var) data_estimate = XTy[k] / (XTX[k] + 1e-6) # Avoid divide-by-zero posterior_mean = weight * data_estimate + (1 - weight) * prior_mean updated_weights[k] = posterior_mean return updated_weights # Step 4: Predict using updated model def predict(features, weights): salary = 0 feature_vector = { "intercept": 1, "experience": features[0], "education_level": features[1], "city_index": features[2] } for k in weights: salary += weights[k] * feature_vector[k] return salary # Step 5: Run update weights = bayesian_update_multivariate(priors, data, likelihood_variance) # Predict for a new person: 3 years exp, college grad, Tier-2 city new_input = (3, 2, 1) predicted_salary = predict(new_input, weights) # Output print("Final Model Weights:") for k, v in weights.items(): print(f" {k}: ₹{v:.2f}") print("\nPredicted salary for input {}: ₹{:.2f}".format(new_input, predicted_salary))
Sample Output:
Final Model Weights:
intercept: ₹10047.71
experience: ₹48871.79
education_level: ₹29763.61
city_index: ₹19785.21Predicted salary for input (3, 2, 1): ₹166375.91
Bayesian Regression – Bayesian Regression Dataset Suitability Checklist