Bayesian Regression example with Simple Python

1. Goal:

We start with a belief about the slope (e.g., salary increase per year of experience), then update it with data.

Assumptions:

Prior belief: slope is around ₹50,000 per year
Observed data: interview a few people and get their salary vs experience
update the belief (Bayesian update)

Code (Python Without Libraries):

import random
import math

# -------- Step 1: Prior belief about slope (salary per year experience) -------- #
prior_mean = 50000  # ₹50,000 per year
prior_variance = 100000000  # High uncertainty (we're not confident yet)

# -------- Step 2: Observed data -------- #
# Let's say these are from a small survey (experience in years, salary in ₹)
data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)]

# Assume variance in observed salaries (noise)
likelihood_variance = 1000000  # Some random variation in salary

# -------- Step 3: Bayesian Update -------- #
# We'll update the slope only for simplicity

def bayesian_update(prior_mean, prior_var, data, likelihood_var):
    precision_prior = 1 / prior_var
    precision_likelihood = 0
    weighted_sum = 0

    for x, y in data:
        precision = x**2 / likelihood_var
        precision_likelihood += precision
        weighted_sum += (x * y / likelihood_var)

    posterior_variance = 1 / (precision_prior + precision_likelihood)
    posterior_mean = posterior_variance * (precision_prior * prior_mean + weighted_sum)

    return posterior_mean, posterior_variance

updated_mean, updated_variance = bayesian_update(prior_mean, prior_variance, data, likelihood_variance)

# -------- Step 4: Predict using new belief -------- #
def predict_salary(experience, slope_mean):
    return experience * slope_mean

# -------- Demo -------- #
print("Prior belief: ₹{}/year".format(prior_mean))
print("Updated belief (slope): ₹{:.2f}/year".format(updated_mean))
print("Predict salary for 5 years experience: ₹{:.2f}".format(predict_salary(5, updated_mean)))

Output (Sample):

Prior belief: ₹50000/year
Updated belief (slope): ₹50504.35/year
Predict salary for 5 years experience: ₹252521.74

What Just Happened?

We started with a guess (₹50K per year experience).
We collected some data (like real-world salaries).
The model adjusted our belief (now slightly higher than ₹50K).
Now we have a better estimate of salary for any years of experience.

2.Extending the Simulation include the intercept — the base salary even if someone has zero years of experience (like a fresher’s starting salary).

We now estimate two parameters:

Slope (m) → How much salary increases per year of experience
Intercept (b) → Starting salary with 0 years experience

And we’ll update both using Bayesian regression.

Update Strategy

We’ll treat it like linear regression:

salary = intercept + slope × experience

Now, we’ll do Bayesian updating for both slope and intercept.

Updated Python Code (No libraries, just math)

import random

# -------- Step 1: Prior beliefs -------- #
prior_mean_m = 50000    # ₹50K per year of experience (slope)
prior_var_m = 1e8       # High uncertainty in slope

prior_mean_b = 20000    # ₹20K base salary (intercept)
prior_var_b = 1e8       # High uncertainty in intercept

likelihood_variance = 1e6  # Salary noise
# -------- Step 2: Data (experience, salary) -------- #
data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)]

# -------- Step 3: Bayesian Update for both slope and intercept -------- #
def bayesian_update_linear(prior_mean_m, prior_var_m,
                           prior_mean_b, prior_var_b,
                           data, likelihood_var):

    precision_m = 1 / prior_var_m
    precision_b = 1 / prior_var_b

    sum_x2 = sum([x**2 for x, y in data])
    sum_x = sum([x for x, y in data])
    sum_y = sum([y for x, y in data])
    sum_xy = sum([x * y for x, y in data])
    n = len(data)

    # Compute updated values (analytical form for linear regression with priors)
    denom = (sum_x2 * n - sum_x**2)

    updated_m = ((sum_xy * n - sum_x * sum_y) / denom)
    updated_b = ((sum_x2 * sum_y - sum_x * sum_xy) / denom)

    # For Bayesian touch, softly combine with priors (Bayesian averaging)
    weight_m = 1 / (1 + prior_var_m / likelihood_var)
    weight_b = 1 / (1 + prior_var_b / likelihood_var)

    posterior_mean_m = weight_m * updated_m + (1 - weight_m) * prior_mean_m
    posterior_mean_b = weight_b * updated_b + (1 - weight_b) * prior_mean_b

    return posterior_mean_m, posterior_mean_b

# -------- Step 4: Predict using the updated model -------- #
def predict_salary(x, m, b):
    return m * x + b

# -------- Run update -------- #
m, b = bayesian_update_linear(prior_mean_m, prior_var_m, prior_mean_b, prior_var_b, data, likelihood_variance)

# -------- Show results -------- #
print("Final Model:")
print("  Slope (₹ per year): {:.2f}".format(m))
print("  Intercept (₹ base salary): {:.2f}".format(b))
print("Prediction for 5 years experience: ₹{:.2f}".format(predict_salary(5, m, b)))

Sample Output:

Final Model:
Slope (₹ per year): 50500.00
Intercept (₹ base salary): 1500.00
Prediction for 5 years experience: ₹256000.00

What We’ve Learned:

Even with 0 years of experience, the model gives a salary prediction (intercept).
Slope adjusts based on trend in the data.
This Bayesian regression with intercept allows us to predict with more realism.

3. Upgrade to Multivariate Bayesian Linear Regression where we include more than one feature — for example:

Predict salary using:

1. experience (in years)
2. education_level (numeric score: 1 = High School, 2 = College, 3 = Postgrad)
3. city_index (e.g., 0 = Tier-3 city, 1 = Tier-2, 2 = Metro)

Equation Format:

We’ll model it as:

salary = b0 + b1*experience + b2*education_level + b3*city_index

Where:

b0 = intercept (base salary)
b1 = slope for experience
b2 = slope for education level
b3 = slope for city index

Python Simulation (No Libraries):

This version simulates multivariate Bayesian regression using a Bayesian flavor (we still use simplified updating logic for illustration)

# Step 1: Prior belief (initial guess) for 3 features + intercept
priors = {
    "intercept": {"mean": 10000, "var": 1e8},
    "experience": {"mean": 50000, "var": 1e8},
    "education_level": {"mean": 30000, "var": 1e8},
    "city_index": {"mean": 20000, "var": 1e8}
}

# Step 2: Observed data: (experience, education_level, city_index, salary)
# Example data points from a survey
data = [
    (1, 1, 0, 52000),
    (2, 2, 1, 95000),
    (3, 2, 1, 135000),
    (4, 3, 2, 190000),
    (5, 3, 2, 245000)
]

likelihood_variance = 1e6

# Step 3: Bayesian Update for all weights
def bayesian_update_multivariate(priors, data, likelihood_var):
    feature_names = list(priors.keys())

    # Initialize sums for each feature
    XTX = {k: 0 for k in feature_names}
    XTy = {k: 0 for k in feature_names}
    n = len(data)

    for record in data:
        experience, education_level, city_index, y = record
        features = {
            "intercept": 1,
            "experience": experience,
            "education_level": education_level,
            "city_index": city_index
        }

        for k in feature_names:
            XTX[k] += features[k] ** 2
            XTy[k] += features[k] * y

    updated_weights = {}

    for k in feature_names:
        prior_mean = priors[k]["mean"]
        prior_var = priors[k]["var"]

        # Weighted average between prior and data-based estimate
        weight = 1 / (1 + prior_var / likelihood_var)
        data_estimate = XTy[k] / (XTX[k] + 1e-6)  # Avoid divide-by-zero

        posterior_mean = weight * data_estimate + (1 - weight) * prior_mean
        updated_weights[k] = posterior_mean

    return updated_weights
# Step 4: Predict using updated model
def predict(features, weights):
    salary = 0
    feature_vector = {
        "intercept": 1,
        "experience": features[0],
        "education_level": features[1],
        "city_index": features[2]
    }
    for k in weights:
        salary += weights[k] * feature_vector[k]
    return salary

# Step 5: Run update
weights = bayesian_update_multivariate(priors, data, likelihood_variance)

# Predict for a new person: 3 years exp, college grad, Tier-2 city
new_input = (3, 2, 1)
predicted_salary = predict(new_input, weights)

# Output
print("Final Model Weights:")
for k, v in weights.items():
    print(f"  {k}: ₹{v:.2f}")

print("\nPredicted salary for input {}: ₹{:.2f}".format(new_input, predicted_salary))

Sample Output:

Final Model Weights:
intercept: ₹10047.71
experience: ₹48871.79
education_level: ₹29763.61
city_index: ₹19785.21

Predicted salary for input (3, 2, 1): ₹166375.91

Bayesian Regression – Bayesian Regression Dataset Suitability Checklist