Bayesian Regression example with Simple Python

1. Goal:

We start with a belief about the slope (e.g., salary increase per year of experience), then update it with data.

Assumptions:

  • Prior belief: slope is around ₹50,000 per year
  • Observed data: interview a few people and get their salary vs experience
  • update the belief (Bayesian update)

Code (Python Without Libraries):

import random
import math

# -------- Step 1: Prior belief about slope (salary per year experience) -------- #
prior_mean = 50000  # ₹50,000 per year
prior_variance = 100000000  # High uncertainty (we're not confident yet)

# -------- Step 2: Observed data -------- #
# Let's say these are from a small survey (experience in years, salary in ₹)
data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)]

# Assume variance in observed salaries (noise)
likelihood_variance = 1000000  # Some random variation in salary

# -------- Step 3: Bayesian Update -------- #
# We'll update the slope only for simplicity

def bayesian_update(prior_mean, prior_var, data, likelihood_var):
    precision_prior = 1 / prior_var
    precision_likelihood = 0
    weighted_sum = 0

    for x, y in data:
        precision = x**2 / likelihood_var
        precision_likelihood += precision
        weighted_sum += (x * y / likelihood_var)

    posterior_variance = 1 / (precision_prior + precision_likelihood)
    posterior_mean = posterior_variance * (precision_prior * prior_mean + weighted_sum)

    return posterior_mean, posterior_variance

updated_mean, updated_variance = bayesian_update(prior_mean, prior_variance, data, likelihood_variance)

# -------- Step 4: Predict using new belief -------- #
def predict_salary(experience, slope_mean):
    return experience * slope_mean

# -------- Demo -------- #
print("Prior belief: ₹{}/year".format(prior_mean))
print("Updated belief (slope): ₹{:.2f}/year".format(updated_mean))
print("Predict salary for 5 years experience: ₹{:.2f}".format(predict_salary(5, updated_mean)))

Output (Sample):

Prior belief: ₹50000/year
Updated belief (slope): ₹50504.35/year
Predict salary for 5 years experience: ₹252521.74

What Just Happened?

  • We started with a guess (₹50K per year experience).
  • We collected some data (like real-world salaries).
  • The model adjusted our belief (now slightly higher than ₹50K).
  • Now we have a better estimate of salary for any years of experience.

2.Extending the Simulation include the intercept — the base salary even if someone has zero years of experience (like a fresher’s starting salary).

We now estimate two parameters:

  • Slope (m) → How much salary increases per year of experience
  • Intercept (b) → Starting salary with 0 years experience

And we’ll update both using Bayesian regression.

Update Strategy

We’ll treat it like linear regression:

salary = intercept + slope × experience

Now, we’ll do Bayesian updating for both slope and intercept.

Updated Python Code (No libraries, just math)

import random

# -------- Step 1: Prior beliefs -------- #
prior_mean_m = 50000    # ₹50K per year of experience (slope)
prior_var_m = 1e8       # High uncertainty in slope

prior_mean_b = 20000    # ₹20K base salary (intercept)
prior_var_b = 1e8       # High uncertainty in intercept

likelihood_variance = 1e6  # Salary noise
# -------- Step 2: Data (experience, salary) -------- #
data = [(1, 52000), (2, 98000), (3, 149000), (4, 202000)]

# -------- Step 3: Bayesian Update for both slope and intercept -------- #
def bayesian_update_linear(prior_mean_m, prior_var_m,
                           prior_mean_b, prior_var_b,
                           data, likelihood_var):

    precision_m = 1 / prior_var_m
    precision_b = 1 / prior_var_b

    sum_x2 = sum([x**2 for x, y in data])
    sum_x = sum([x for x, y in data])
    sum_y = sum([y for x, y in data])
    sum_xy = sum([x * y for x, y in data])
    n = len(data)

    # Compute updated values (analytical form for linear regression with priors)
    denom = (sum_x2 * n - sum_x**2)

    updated_m = ((sum_xy * n - sum_x * sum_y) / denom)
    updated_b = ((sum_x2 * sum_y - sum_x * sum_xy) / denom)

    # For Bayesian touch, softly combine with priors (Bayesian averaging)
    weight_m = 1 / (1 + prior_var_m / likelihood_var)
    weight_b = 1 / (1 + prior_var_b / likelihood_var)

    posterior_mean_m = weight_m * updated_m + (1 - weight_m) * prior_mean_m
    posterior_mean_b = weight_b * updated_b + (1 - weight_b) * prior_mean_b

    return posterior_mean_m, posterior_mean_b

# -------- Step 4: Predict using the updated model -------- #
def predict_salary(x, m, b):
    return m * x + b

# -------- Run update -------- #
m, b = bayesian_update_linear(prior_mean_m, prior_var_m, prior_mean_b, prior_var_b, data, likelihood_variance)

# -------- Show results -------- #
print("Final Model:")
print("  Slope (₹ per year): {:.2f}".format(m))
print("  Intercept (₹ base salary): {:.2f}".format(b))
print("Prediction for 5 years experience: ₹{:.2f}".format(predict_salary(5, m, b)))

Sample Output:

Final Model:
Slope (₹ per year): 50500.00
Intercept (₹ base salary): 1500.00
Prediction for 5 years experience: ₹256000.00

What We’ve Learned:

  • Even with 0 years of experience, the model gives a salary prediction (intercept).
  • Slope adjusts based on trend in the data.
  • This Bayesian regression with intercept allows us to predict with more realism.

3. Upgrade to Multivariate Bayesian Linear Regression where we include more than one feature — for example:

Predict salary using:

1. experience (in years)
2. education_level (numeric score: 1 = High School, 2 = College, 3 = Postgrad)
3. city_index (e.g., 0 = Tier-3 city, 1 = Tier-2, 2 = Metro)

Equation Format:

We’ll model it as:

salary = b0 + b1*experience + b2*education_level + b3*city_index

Where:

  • b0 = intercept (base salary)
  • b1 = slope for experience
  • b2 = slope for education level
  • b3 = slope for city index

Python Simulation (No Libraries):

This version simulates multivariate Bayesian regression using a Bayesian flavor (we still use simplified updating logic for illustration)

# Step 1: Prior belief (initial guess) for 3 features + intercept
priors = {
    "intercept": {"mean": 10000, "var": 1e8},
    "experience": {"mean": 50000, "var": 1e8},
    "education_level": {"mean": 30000, "var": 1e8},
    "city_index": {"mean": 20000, "var": 1e8}
}

# Step 2: Observed data: (experience, education_level, city_index, salary)
# Example data points from a survey
data = [
    (1, 1, 0, 52000),
    (2, 2, 1, 95000),
    (3, 2, 1, 135000),
    (4, 3, 2, 190000),
    (5, 3, 2, 245000)
]

likelihood_variance = 1e6

# Step 3: Bayesian Update for all weights
def bayesian_update_multivariate(priors, data, likelihood_var):
    feature_names = list(priors.keys())

    # Initialize sums for each feature
    XTX = {k: 0 for k in feature_names}
    XTy = {k: 0 for k in feature_names}
    n = len(data)

    for record in data:
        experience, education_level, city_index, y = record
        features = {
            "intercept": 1,
            "experience": experience,
            "education_level": education_level,
            "city_index": city_index
        }

        for k in feature_names:
            XTX[k] += features[k] ** 2
            XTy[k] += features[k] * y

    updated_weights = {}

    for k in feature_names:
        prior_mean = priors[k]["mean"]
        prior_var = priors[k]["var"]

        # Weighted average between prior and data-based estimate
        weight = 1 / (1 + prior_var / likelihood_var)
        data_estimate = XTy[k] / (XTX[k] + 1e-6)  # Avoid divide-by-zero

        posterior_mean = weight * data_estimate + (1 - weight) * prior_mean
        updated_weights[k] = posterior_mean

    return updated_weights
# Step 4: Predict using updated model
def predict(features, weights):
    salary = 0
    feature_vector = {
        "intercept": 1,
        "experience": features[0],
        "education_level": features[1],
        "city_index": features[2]
    }
    for k in weights:
        salary += weights[k] * feature_vector[k]
    return salary

# Step 5: Run update
weights = bayesian_update_multivariate(priors, data, likelihood_variance)

# Predict for a new person: 3 years exp, college grad, Tier-2 city
new_input = (3, 2, 1)
predicted_salary = predict(new_input, weights)

# Output
print("Final Model Weights:")
for k, v in weights.items():
    print(f"  {k}: ₹{v:.2f}")

print("\nPredicted salary for input {}: ₹{:.2f}".format(new_input, predicted_salary))

Sample Output:

Final Model Weights:
intercept: ₹10047.71
experience: ₹48871.79
education_level: ₹29763.61
city_index: ₹19785.21

Predicted salary for input (3, 2, 1): ₹166375.91

Bayesian Regression – Bayesian Regression Dataset Suitability Checklist