Linear Regression & Multivariate Linear Regression example with Simple Python
We’ll:
- Calculate the slope m and intercept manually.
- Use those to predict future values.
- Use a small dataset for demonstration.
1. Step-by-step logic:
We use these formulas:
- Slope m:
m= n∑(xy)−∑x∑y / n∑(x²)-(∑x)²
- Intercept c:
c= ∑y−m∑x /n
Python Code (No Libraries)
# Sample data: (x = hours studied, y = exam scores)
x = [1, 2, 3, 4, 5] # independent variable
y = [2, 4, 5, 4, 5] # dependent variable
n = len(x)
# Step 1: Calculate the sums needed
sum_x = sum(x)
sum_y = sum(y)
sum_xy = sum([x[i]*y[i] for i in range(n)])
sum_x2 = sum([x[i]**2 for i in range(n)])
# Step 2: Calculate slope (m) and intercept (c)
m = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x**2)
c = (sum_y - m * sum_x) / n
print(f"Slope (m): {m}")
print(f"Intercept (c): {c}")
# Step 3: Predict function
def predict(x_val):
return m * x_val + c
# Step 4: Test predictions
test_hours = 6
predicted_score = predict(test_hours)
print(f"Predicted score for studying {test_hours} hours = {predicted_score:.2f}")
Output:
We’ll get the slope, intercept, and a predicted score for 6 hours of study.
2. Goal:
We want to predict exam scores based on hours studied using linear regression.
Let’s say we have the following small dataset:
| Hours Studied (x) | Exam Score (y) |
|---|---|
| 1 | 2 |
| 2 | 4 |
| 3 | 5 |
| 4 | 4 |
| 5 | 5 |
3. Step-by-Step Breakdown (Text-based)
1. Understanding our data
We observe that as hours of study increase, the score also tends to increase — though not perfectly. This hints at a relationship we can try to model with a straight line.
2. Basic math we need
We compute these values from the dataset:
- ∑x = 1 + 2 + 3 + 4 + 5 = 15
- ∑y = 2 + 4 + 5 + 4 + 5 = 20
- ∑xy = (1×2) + (2×4) + (3×5) + (4×4) + (5×5) = 2 + 8 + 15 + 16 + 25 = 66
- ∑x² = 1² + 2² + 3² + 4² + 5² = 1 + 4 + 9 + 16 + 25 = 55
- n = 5 (number of data points)
3. Compute the slope (m)
m = 5⋅66−15⋅20 / 5.55-15² = 330 – 300 / 275 – 225 = 30/50 = 0.6
This tells us: for every extra hour studied, the score increases by 0.6 on average.
4. Compute the intercept (c)
c=20−0.6⋅15 / 5 = 20 – 9 /5 = 11 / 5 = 2.2
This means: if someone studies 0 hours, they might still get 2.2 marks
5. Build the final equation
We now have our line:
Score=0.6⋅Hours Studied+2.2
6. Use it to predict
If someone studies 6 hours, their predicted score is:
0.6⋅6+2.2=3.6+2.2=5.8
So, we predict this student will score ~5.8.
4. Final Takeaway:
Linear regression:
- Finds a line like: score = 0.6 × hours + 2.2
- Helps us predict values based on trends
- Uses only basic math operations (multiplication, addition, division)
5. Sample Dataset for Multivariate Linear Regression:
| Size (sqft) | Bedrooms | Price (₹ lakh) |
|---|---|---|
| 1000 | 2 | 50 |
| 1200 | 3 | 60 |
| 1500 | 3 | 65 |
| 1800 | 4 | 80 |
| 2000 | 4 | 90 |
Python Code (No Libraries)
# Multivariate Linear Regression (2 features): Size and Bedrooms
x1 = [1000, 1200, 1500, 1800, 2000] # Size in sqft
x2 = [2, 3, 3, 4, 4] # Bedrooms
y = [50, 60, 65, 80, 90] # Price in lakh
n = len(x1)
# Step 1: Compute sums
sum_x1 = sum(x1)
sum_x2 = sum(x2)
sum_y = sum(y)
sum_x1x1 = sum([i**2 for i in x1])
sum_x2x2 = sum([i**2 for i in x2])
sum_x1x2 = sum([x1[i]*x2[i] for i in range(n)])
sum_x1y = sum([x1[i]*y[i] for i in range(n)])
sum_x2y = sum([x2[i]*y[i] for i in range(n)])
# Step 2: Solve normal equations (for 2 variables + 1 bias)
# Matrix form: A * [m1, m2, c] = B
# Calculating coefficients manually using Cramer's Rule or Matrix Algebra
# Let's use the Normal Equation derived from Linear Algebra:
# A = [[Σx1², Σx1x2, Σx1],
# [Σx1x2, Σx2², Σx2],
# [Σx1, Σx2, n]]
# B = [[Σx1y],
# [Σx2y],
# [Σy]]
A = [
[sum_x1x1, sum_x1x2, sum_x1],
[sum_x1x2, sum_x2x2, sum_x2],
[sum_x1, sum_x2, n]
]
B = [sum_x1y, sum_x2y, sum_y]
# Simple 3x3 linear system solver (using Cramer's Rule or inverse)
# We'll use Cramer's Rule for readability
def determinant_3x3(m):
return (
m[0][0] * (m[1][1]*m[2][2] - m[1][2]*m[2][1]) -
m[0][1] * (m[1][0]*m[2][2] - m[1][2]*m[2][0]) +
m[0][2] * (m[1][0]*m[2][1] - m[1][1]*m[2][0])
)
# Replace columns one by one for Cramer's Rule
def replace_column(matrix, col_idx, new_col):
return [
[new_col[i] if j == col_idx else matrix[i][j] for j in range(3)]
for i in range(3)
]
D = determinant_3x3(A)
D1 = determinant_3x3(replace_column(A, 0, B))
D2 = determinant_3x3(replace_column(A, 1, B))
D3 = determinant_3x3(replace_column(A, 2, B))
m1 = D1 / D
m2 = D2 / D
c = D3 / D
print(f"Model: price = {m1:.2f} * size + {m2:.2f} * bedrooms + {c:.2f}")
# Step 3: Predict a house price
def predict(size, bedrooms):
return m1 * size + m2 * bedrooms + c
# Example prediction
test_size = 1600
test_bedrooms = 3
predicted_price = predict(test_size, test_bedrooms)
print(f"Predicted price for {test_size} sqft and {test_bedrooms} bedrooms = ₹{predicted_price:.2f} lakh")
Output:
The formula would look like:
price = 0.03 * size + 4.11 * bedrooms + 10.28
For 1600 sqft and 3 bedrooms, the prediction might be around ₹71 lakh (depends on calculation).
Linear Regression – Basic Math Concepts
