Bayesian Regression example with Simple Python
1. Goal:
We start with a belief about the slope (e.g., salary increase per year of experience), then update it with data.
Assumptions:
- Prior belief: slope is around ₹50,000 per year
- Observed data: interview a few people and get their salary vs experience
- update the belief (Bayesian update)
Code (Python Without Libraries):
import random
import math
# -------- Step 1: Prior belief about slope (salary per year experience) -------- #
prior_mean = 50000 # ₹50,000 per year
prior_variance = 100000000 # High uncertainty (we're not confident yet)
# -------- Step 2: Observed data -------- #
# Let's say these are from a small survey (experience in years, salary in ₹)
data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)]
# Assume variance in observed salaries (noise)
likelihood_variance = 1000000 # Some random variation in salary
# -------- Step 3: Bayesian Update -------- #
# We'll update the slope only for simplicity
def bayesian_update(prior_mean, prior_var, data, likelihood_var):
precision_prior = 1 / prior_var
precision_likelihood = 0
weighted_sum = 0
for x, y in data:
precision = x**2 / likelihood_var
precision_likelihood += precision
weighted_sum += (x * y / likelihood_var)
posterior_variance = 1 / (precision_prior + precision_likelihood)
posterior_mean = posterior_variance * (precision_prior * prior_mean + weighted_sum)
return posterior_mean, posterior_variance
updated_mean, updated_variance = bayesian_update(prior_mean, prior_variance, data, likelihood_variance)
# -------- Step 4: Predict using new belief -------- #
def predict_salary(experience, slope_mean):
return experience * slope_mean
# -------- Demo -------- #
print("Prior belief: ₹{}/year".format(prior_mean))
print("Updated belief (slope): ₹{:.2f}/year".format(updated_mean))
print("Predict salary for 5 years experience: ₹{:.2f}".format(predict_salary(5, updated_mean)))
Output (Sample):
Prior belief: ₹50000/year
Updated belief (slope): ₹50504.35/year
Predict salary for 5 years experience: ₹252521.74
What Just Happened?
- We started with a guess (₹50K per year experience).
- We collected some data (like real-world salaries).
- The model adjusted our belief (now slightly higher than ₹50K).
- Now we have a better estimate of salary for any years of experience.
2.Extending the Simulation include the intercept — the base salary even if someone has zero years of experience (like a fresher’s starting salary).
We now estimate two parameters:
- Slope (m) → How much salary increases per year of experience
- Intercept (b) → Starting salary with 0 years experience
And we’ll update both using Bayesian regression.
Update Strategy
We’ll treat it like linear regression:
salary = intercept + slope × experience
Now, we’ll do Bayesian updating for both slope and intercept.
Updated Python Code (No libraries, just math)
import random
# -------- Step 1: Prior beliefs -------- #
prior_mean_m = 50000 # ₹50K per year of experience (slope)
prior_var_m = 1e8 # High uncertainty in slope
prior_mean_b = 20000 # ₹20K base salary (intercept)
prior_var_b = 1e8 # High uncertainty in intercept
likelihood_variance = 1e6 # Salary noise
# -------- Step 2: Data (experience, salary) -------- #
data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)]
# -------- Step 3: Bayesian Update for both slope and intercept -------- #
def bayesian_update_linear(prior_mean_m, prior_var_m,
prior_mean_b, prior_var_b,
data, likelihood_var):
precision_m = 1 / prior_var_m
precision_b = 1 / prior_var_b
sum_x2 = sum([x**2 for x, y in data])
sum_x = sum([x for x, y in data])
sum_y = sum([y for x, y in data])
sum_xy = sum([x * y for x, y in data])
n = len(data)
# Compute updated values (analytical form for linear regression with priors)
denom = (sum_x2 * n - sum_x**2)
updated_m = ((sum_xy * n - sum_x * sum_y) / denom)
updated_b = ((sum_x2 * sum_y - sum_x * sum_xy) / denom)
# For Bayesian touch, softly combine with priors (Bayesian averaging)
weight_m = 1 / (1 + prior_var_m / likelihood_var)
weight_b = 1 / (1 + prior_var_b / likelihood_var)
posterior_mean_m = weight_m * updated_m + (1 - weight_m) * prior_mean_m
posterior_mean_b = weight_b * updated_b + (1 - weight_b) * prior_mean_b
return posterior_mean_m, posterior_mean_b
# -------- Step 4: Predict using the updated model -------- #
def predict_salary(x, m, b):
return m * x + b
# -------- Run update -------- #
m, b = bayesian_update_linear(prior_mean_m, prior_var_m, prior_mean_b, prior_var_b, data, likelihood_variance)
# -------- Show results -------- #
print("Final Model:")
print(" Slope (₹ per year): {:.2f}".format(m))
print(" Intercept (₹ base salary): {:.2f}".format(b))
print("Prediction for 5 years experience: ₹{:.2f}".format(predict_salary(5, m, b)))
Sample Output:
Final Model:
Slope (₹ per year): 50500.00
Intercept (₹ base salary): 1500.00
Prediction for 5 years experience: ₹256000.00
What We’ve Learned:
- Even with 0 years of experience, the model gives a salary prediction (intercept).
- Slope adjusts based on trend in the data.
- This Bayesian regression with intercept allows us to predict with more realism.
3. Upgrade to Multivariate Bayesian Linear Regression where we include more than one feature — for example:
Predict salary using:
1. experience (in years)
2. education_level (numeric score: 1 = High School, 2 = College, 3 = Postgrad)
3. city_index (e.g., 0 = Tier-3 city, 1 = Tier-2, 2 = Metro)
Equation Format:
We’ll model it as:
salary = b0 + b1*experience + b2*education_level + b3*city_index
Where:
- b0 = intercept (base salary)
- b1 = slope for experience
- b2 = slope for education level
- b3 = slope for city index
Python Simulation (No Libraries):
This version simulates multivariate Bayesian regression using a Bayesian flavor (we still use simplified updating logic for illustration)
# Step 1: Prior belief (initial guess) for 3 features + intercept
priors = {
"intercept": {"mean": 10000, "var": 1e8},
"experience": {"mean": 50000, "var": 1e8},
"education_level": {"mean": 30000, "var": 1e8},
"city_index": {"mean": 20000, "var": 1e8}
}
# Step 2: Observed data: (experience, education_level, city_index, salary)
# Example data points from a survey
data = [
(1, 1, 0, 52000),
(2, 2, 1, 95000),
(3, 2, 1, 135000),
(4, 3, 2, 190000),
(5, 3, 2, 245000)
]
likelihood_variance = 1e6
# Step 3: Bayesian Update for all weights
def bayesian_update_multivariate(priors, data, likelihood_var):
feature_names = list(priors.keys())
# Initialize sums for each feature
XTX = {k: 0 for k in feature_names}
XTy = {k: 0 for k in feature_names}
n = len(data)
for record in data:
experience, education_level, city_index, y = record
features = {
"intercept": 1,
"experience": experience,
"education_level": education_level,
"city_index": city_index
}
for k in feature_names:
XTX[k] += features[k] ** 2
XTy[k] += features[k] * y
updated_weights = {}
for k in feature_names:
prior_mean = priors[k]["mean"]
prior_var = priors[k]["var"]
# Weighted average between prior and data-based estimate
weight = 1 / (1 + prior_var / likelihood_var)
data_estimate = XTy[k] / (XTX[k] + 1e-6) # Avoid divide-by-zero
posterior_mean = weight * data_estimate + (1 - weight) * prior_mean
updated_weights[k] = posterior_mean
return updated_weights
# Step 4: Predict using updated model
def predict(features, weights):
salary = 0
feature_vector = {
"intercept": 1,
"experience": features[0],
"education_level": features[1],
"city_index": features[2]
}
for k in weights:
salary += weights[k] * feature_vector[k]
return salary
# Step 5: Run update
weights = bayesian_update_multivariate(priors, data, likelihood_variance)
# Predict for a new person: 3 years exp, college grad, Tier-2 city
new_input = (3, 2, 1)
predicted_salary = predict(new_input, weights)
# Output
print("Final Model Weights:")
for k, v in weights.items():
print(f" {k}: ₹{v:.2f}")
print("\nPredicted salary for input {}: ₹{:.2f}".format(new_input, predicted_salary))
Sample Output:
Final Model Weights:
intercept: ₹10047.71
experience: ₹48871.79
education_level: ₹29763.61
city_index: ₹19785.21Predicted salary for input (3, 2, 1): ₹166375.91
Bayesian Regression – Bayesian Regression Dataset Suitability Checklist
