Linear Regression & Multivariate Linear Regression example with Simple Python

We’ll:

  1. Calculate the slope m and intercept manually.
  2. Use those to predict future values.
  3. Use a small dataset for demonstration.

1. Step-by-step logic:

We use these formulas:

  • Slope m:

    m= n∑(xy)−∑x∑y​ / n∑(x²)-(∑x)²

  • Intercept c:

    c= ∑y−m∑x /n

Python Code (No Libraries)

# Sample data: (x = hours studied, y = exam scores)
x = [1, 2, 3, 4, 5]    # independent variable
y = [2, 4, 5, 4, 5]    # dependent variable

n = len(x)

# Step 1: Calculate the sums needed
sum_x = sum(x)
sum_y = sum(y)
sum_xy = sum([x[i]*y[i] for i in range(n)])
sum_x2 = sum([x[i]**2 for i in range(n)])

# Step 2: Calculate slope (m) and intercept (c)
m = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x**2)
c = (sum_y - m * sum_x) / n

print(f"Slope (m): {m}")
print(f"Intercept (c): {c}")

# Step 3: Predict function
def predict(x_val):
    return m * x_val + c

# Step 4: Test predictions
test_hours = 6
predicted_score = predict(test_hours)
print(f"Predicted score for studying {test_hours} hours = {predicted_score:.2f}")

Output:
We’ll get the slope, intercept, and a predicted score for 6 hours of study.

2. Goal:

We want to predict exam scores based on hours studied using linear regression.

Let’s say we have the following small dataset:

Hours Studied (x) Exam Score (y)
1 2
2 4
3 5
4 4
5 5

3. Step-by-Step Breakdown (Text-based)

1. Understanding our data

We observe that as hours of study increase, the score also tends to increase — though not perfectly. This hints at a relationship we can try to model with a straight line.

2. Basic math we need

We compute these values from the dataset:

  • ∑x = 1 + 2 + 3 + 4 + 5 = 15
  • ∑y = 2 + 4 + 5 + 4 + 5 = 20
  • ∑xy = (1×2) + (2×4) + (3×5) + (4×4) + (5×5) = 2 + 8 + 15 + 16 + 25 = 66
  • ∑x² = 1² + 2² + 3² + 4² + 5² = 1 + 4 + 9 + 16 + 25 = 55
  • n = 5 (number of data points)

3. Compute the slope (m)

m = 5⋅66−15⋅20 / 5.55-15² = 330 – 300 / 275 – 225 = 30/50 = 0.6

This tells us: for every extra hour studied, the score increases by 0.6 on average.

4. Compute the intercept (c)

c=20−0.6⋅15 / 5 = 20 – 9 /5 = 11 / 5 = 2.2

This means: if someone studies 0 hours, they might still get 2.2 marks

5. Build the final equation

We now have our line:

Score=0.6⋅Hours Studied+2.2

6. Use it to predict

If someone studies 6 hours, their predicted score is:

0.6⋅6+2.2=3.6+2.2=5.8

So, we predict this student will score ~5.8.

4. Final Takeaway:

Linear regression:

  • Finds a line like: score = 0.6 × hours + 2.2
  • Helps us predict values based on trends
  • Uses only basic math operations (multiplication, addition, division)

5. Sample Dataset for Multivariate Linear Regression:

Size (sqft) Bedrooms Price (₹ lakh)
1000 2 50
1200 3 60
1500 3 65
1800 4 80
2000 4 90

Python Code (No Libraries)

# Multivariate Linear Regression (2 features): Size and Bedrooms
x1 = [1000, 1200, 1500, 1800, 2000]  # Size in sqft
x2 = [2, 3, 3, 4, 4]                 # Bedrooms
y  = [50, 60, 65, 80, 90]            # Price in lakh

n = len(x1)

# Step 1: Compute sums
sum_x1 = sum(x1)
sum_x2 = sum(x2)
sum_y = sum(y)
sum_x1x1 = sum([i**2 for i in x1])
sum_x2x2 = sum([i**2 for i in x2])
sum_x1x2 = sum([x1[i]*x2[i] for i in range(n)])
sum_x1y = sum([x1[i]*y[i] for i in range(n)])
sum_x2y = sum([x2[i]*y[i] for i in range(n)])

# Step 2: Solve normal equations (for 2 variables + 1 bias)
# Matrix form: A * [m1, m2, c] = B

# Calculating coefficients manually using Cramer's Rule or Matrix Algebra
# Let's use the Normal Equation derived from Linear Algebra:

# A = [[Σx1², Σx1x2, Σx1],
#      [Σx1x2, Σx2², Σx2],
#      [Σx1, Σx2, n]]

# B = [[Σx1y],
#      [Σx2y],
#      [Σy]]

A = [
    [sum_x1x1, sum_x1x2, sum_x1],
    [sum_x1x2, sum_x2x2, sum_x2],
    [sum_x1,   sum_x2,   n]
]

B = [sum_x1y, sum_x2y, sum_y]

# Simple 3x3 linear system solver (using Cramer's Rule or inverse)
# We'll use Cramer's Rule for readability

def determinant_3x3(m):
    return (
        m[0][0] * (m[1][1]*m[2][2] - m[1][2]*m[2][1]) -
        m[0][1] * (m[1][0]*m[2][2] - m[1][2]*m[2][0]) +
        m[0][2] * (m[1][0]*m[2][1] - m[1][1]*m[2][0])
    )

# Replace columns one by one for Cramer's Rule
def replace_column(matrix, col_idx, new_col):
    return [
        [new_col[i] if j == col_idx else matrix[i][j] for j in range(3)]
        for i in range(3)
    ]

D = determinant_3x3(A)
D1 = determinant_3x3(replace_column(A, 0, B))
D2 = determinant_3x3(replace_column(A, 1, B))
D3 = determinant_3x3(replace_column(A, 2, B))

m1 = D1 / D
m2 = D2 / D
c  = D3 / D

print(f"Model: price = {m1:.2f} * size + {m2:.2f} * bedrooms + {c:.2f}")

# Step 3: Predict a house price
def predict(size, bedrooms):
    return m1 * size + m2 * bedrooms + c

# Example prediction
test_size = 1600
test_bedrooms = 3
predicted_price = predict(test_size, test_bedrooms)
print(f"Predicted price for {test_size} sqft and {test_bedrooms} bedrooms = ₹{predicted_price:.2f} lakh")

Output:
The formula would look like:

price = 0.03 * size + 4.11 * bedrooms + 10.28

For 1600 sqft and 3 bedrooms, the prediction might be around ₹71 lakh (depends on calculation).

Linear Regression – Basic Math Concepts