Linear Regression & Multivariate Linear Regression example with Simple Python
We’ll:
- Calculate the slope m and intercept manually.
- Use those to predict future values.
- Use a small dataset for demonstration.
1. Step-by-step logic:
We use these formulas:
- Slope m:
m= n∑(xy)−∑x∑y / n∑(x²)-(∑x)²
- Intercept c:
c= ∑y−m∑x /n
Python Code (No Libraries)
# Sample data: (x = hours studied, y = exam scores) x = [1, 2, 3, 4, 5] # independent variable y = [2, 4, 5, 4, 5] # dependent variable n = len(x) # Step 1: Calculate the sums needed sum_x = sum(x) sum_y = sum(y) sum_xy = sum([x[i]*y[i] for i in range(n)]) sum_x2 = sum([x[i]**2 for i in range(n)]) # Step 2: Calculate slope (m) and intercept (c) m = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x**2) c = (sum_y - m * sum_x) / n print(f"Slope (m): {m}") print(f"Intercept (c): {c}") # Step 3: Predict function def predict(x_val): return m * x_val + c # Step 4: Test predictions test_hours = 6 predicted_score = predict(test_hours) print(f"Predicted score for studying {test_hours} hours = {predicted_score:.2f}")
Output:
We’ll get the slope, intercept, and a predicted score for 6 hours of study.
2. Goal:
We want to predict exam scores based on hours studied using linear regression.
Let’s say we have the following small dataset:
Hours Studied (x) | Exam Score (y) |
---|---|
1 | 2 |
2 | 4 |
3 | 5 |
4 | 4 |
5 | 5 |
3. Step-by-Step Breakdown (Text-based)
1. Understanding our data
We observe that as hours of study increase, the score also tends to increase — though not perfectly. This hints at a relationship we can try to model with a straight line.
2. Basic math we need
We compute these values from the dataset:
- ∑x = 1 + 2 + 3 + 4 + 5 = 15
- ∑y = 2 + 4 + 5 + 4 + 5 = 20
- ∑xy = (1×2) + (2×4) + (3×5) + (4×4) + (5×5) = 2 + 8 + 15 + 16 + 25 = 66
- ∑x² = 1² + 2² + 3² + 4² + 5² = 1 + 4 + 9 + 16 + 25 = 55
- n = 5 (number of data points)
3. Compute the slope (m)
m = 5⋅66−15⋅20 / 5.55-15² = 330 – 300 / 275 – 225 = 30/50 = 0.6
This tells us: for every extra hour studied, the score increases by 0.6 on average.
4. Compute the intercept (c)
c=20−0.6⋅15 / 5 = 20 – 9 /5 = 11 / 5 = 2.2
This means: if someone studies 0 hours, they might still get 2.2 marks
5. Build the final equation
We now have our line:
Score=0.6⋅Hours Studied+2.2
6. Use it to predict
If someone studies 6 hours, their predicted score is:
0.6⋅6+2.2=3.6+2.2=5.8
So, we predict this student will score ~5.8.
4. Final Takeaway:
Linear regression:
- Finds a line like: score = 0.6 × hours + 2.2
- Helps us predict values based on trends
- Uses only basic math operations (multiplication, addition, division)
5. Sample Dataset for Multivariate Linear Regression:
Size (sqft) | Bedrooms | Price (₹ lakh) |
---|---|---|
1000 | 2 | 50 |
1200 | 3 | 60 |
1500 | 3 | 65 |
1800 | 4 | 80 |
2000 | 4 | 90 |
Python Code (No Libraries)
# Multivariate Linear Regression (2 features): Size and Bedrooms x1 = [1000, 1200, 1500, 1800, 2000] # Size in sqft x2 = [2, 3, 3, 4, 4] # Bedrooms y = [50, 60, 65, 80, 90] # Price in lakh n = len(x1) # Step 1: Compute sums sum_x1 = sum(x1) sum_x2 = sum(x2) sum_y = sum(y) sum_x1x1 = sum([i**2 for i in x1]) sum_x2x2 = sum([i**2 for i in x2]) sum_x1x2 = sum([x1[i]*x2[i] for i in range(n)]) sum_x1y = sum([x1[i]*y[i] for i in range(n)]) sum_x2y = sum([x2[i]*y[i] for i in range(n)]) # Step 2: Solve normal equations (for 2 variables + 1 bias) # Matrix form: A * [m1, m2, c] = B # Calculating coefficients manually using Cramer's Rule or Matrix Algebra # Let's use the Normal Equation derived from Linear Algebra: # A = [[Σx1², Σx1x2, Σx1], # [Σx1x2, Σx2², Σx2], # [Σx1, Σx2, n]] # B = [[Σx1y], # [Σx2y], # [Σy]] A = [ [sum_x1x1, sum_x1x2, sum_x1], [sum_x1x2, sum_x2x2, sum_x2], [sum_x1, sum_x2, n] ] B = [sum_x1y, sum_x2y, sum_y] # Simple 3x3 linear system solver (using Cramer's Rule or inverse) # We'll use Cramer's Rule for readability def determinant_3x3(m): return ( m[0][0] * (m[1][1]*m[2][2] - m[1][2]*m[2][1]) - m[0][1] * (m[1][0]*m[2][2] - m[1][2]*m[2][0]) + m[0][2] * (m[1][0]*m[2][1] - m[1][1]*m[2][0]) ) # Replace columns one by one for Cramer's Rule def replace_column(matrix, col_idx, new_col): return [ [new_col[i] if j == col_idx else matrix[i][j] for j in range(3)] for i in range(3) ] D = determinant_3x3(A) D1 = determinant_3x3(replace_column(A, 0, B)) D2 = determinant_3x3(replace_column(A, 1, B)) D3 = determinant_3x3(replace_column(A, 2, B)) m1 = D1 / D m2 = D2 / D c = D3 / D print(f"Model: price = {m1:.2f} * size + {m2:.2f} * bedrooms + {c:.2f}") # Step 3: Predict a house price def predict(size, bedrooms): return m1 * size + m2 * bedrooms + c # Example prediction test_size = 1600 test_bedrooms = 3 predicted_price = predict(test_size, test_bedrooms) print(f"Predicted price for {test_size} sqft and {test_bedrooms} bedrooms = ₹{predicted_price:.2f} lakh")
Output:
The formula would look like:
price = 0.03 * size + 4.11 * bedrooms + 10.28
For 1600 sqft and 3 bedrooms, the prediction might be around ₹71 lakh (depends on calculation).
Linear Regression – Basic Math Concepts