Brainstorming Session – Unsupervised Learning

1. What is a Centroid?

Imagine a bunch of dots drawn on paper.

Now ask:”If I were to place a sticker right in the middle of these dots, where should I put it?” That “middle sticker” is called the centroid. It’s like the center of a group.

In Even Simpler Terms:

A centroid is just a fancy word for the average position of a group of things.

Kid-Friendly Analogy:

Let’s say 3 friends are sitting on a see-saw:

  • One is sitting at position 2
  • Another at position 4
  • The third at position 6

If we want the see-saw to balance, we need to sit right at position 4, which is the average of 2, 4, and 6.That balanced spot is the centroid of our friends’ positions.

A Tiny Math Example (2D):

Suppose we have 3 points on a grid:

(2, 4), (4, 6), and (6, 8)

To find the centroid:

Average of X values = (2 + 4 + 6) / 3 = 4
Average of Y values = (4 + 6 + 8) / 3 = 6
So, the centroid is (4, 6)

In Machine Learning:

When a computer is trying to group similar things, it finds the “middle” of each group — that middle point is the centroid.The computer keeps updating these centroids to make the groups better and better!

2. When Should We Stop Finding the Centroid?

In K-Means and similar unsupervised learning methods, we do need a stopping rule, otherwise — it could go on forever (theoretically).

Here are the Common Stopping Criteria:

1. Fixed Number of Iterations

If we say:

“Hey algorithm, run the centroid update process 10 times, and then stop.” Simple and safe, but not always optimal.

2. When Centroids Stop Moving (Convergence)

If we say:

“Keep running until the centroids don’t change much anymore (or not at all).” Smart! This means the clusters are stable, and further updates don’t improve the grouping.

Usually measured using a small number like 0.001 difference in position.

3. When Cluster Assignments Don’t Change

If we say:

“If all points stay in the same group as last time, stop.” This means the grouping has stabilized.

Bonus Control — How Many Clusters (K) Do We Want?

Now, this is very important and ties back to our main concern:

“How do we know how many clusters we should even look for?”

In K-Means: We must decide K (number of clusters) before we start.

This is often set based on:

  • Prior knowledge (e.g. “we know there are 3 types of customers”)
  • Business goals
  • Trial and error using techniques like:

The Elbow Method (for choosing K)
We try:

  • K = 1, 2, 3, …, 10
  • For each K, calculate how tightly points are grouped (called within-cluster sum of squares)
  • Plot it — the curve often bends like an elbow
  • Choose the K where the elbow bends — more clusters after that don’t give much improvement

This avoids over-grouping and keeps it practical.

Why Not Infinite Clusters?

We could keep making smaller and smaller groups, even one point per group (K = number of points).But that’s useless — it defeats the purpose of finding patterns or simplifying the data.

So, we strike a balance:

  • Not too few clusters (too general)
  • Not too many (too detailed or noisy)

We stop when the centroids stop changing, or when we’ve reached a maximum number of tries, and we choose the number of clusters based on what makes the grouping meaningful but simple — using methods like the elbow rule.

3. What is the Elbow Rule?

The Elbow Rule is a way to help us decide: How many clusters (K) should we use for grouping?

It’s like asking:“How many buckets should I use to sort these toys so they’re grouped nicely — but not too many that it gets silly?”

The Idea Behind the Elbow Rule:

When we increase the number of clusters (K), the computer groups things more accurately.But after a point, the improvement becomes very small, even if we add more clusters.This creates a graph that looks like a bent elbow — and that’s where we should stop!

Here’s What We Measure:

We look at a thing called “Within-Cluster Sum of Squares” (WCSS) — don’t worry about the name.It just means:”How far, on average, are the points from their group center?”

Steps of the Elbow Rule

1. Try K = 1, K = 2, K = 3… up to K = 10 or so
2. For each K:

  • Group the data
  • Measure the “tightness” of each group (WCSS)

3. Plot K vs WCSS:

  • X-axis = K (number of clusters)
  • Y-axis = WCSS (how scattered the groups are)

4. Look at the graph:

  • It goes down steeply at first
  • Then slows down
  • At the turning point (the “elbow”) — that’s your best K!

What It Looks Like:

WCSS │ │ ● │ ● │ ● │ ● │● ├────────────────────── K 1 2 3 4 5 ... Elbow!

Real-Life Analogy: Sorting Toys

Imagine we’re sorting 100 toys into bins:

  • With 1 bin, everything’s a mess
  • With 2 bins, it’s better
  • With 3 bins, we get cars, animals, and blocks
  • With 10 bins, we’re overdoing it (like separating green blocks from red ones)

The elbow is where adding more bins stops making big improvements.

So, the Elbow Rule helps us:

  • Avoid too few clusters (bad grouping)
  • Avoid too many clusters (overfitting)
  • Pick the sweet spot — just enough groups to make sense

Pure Python Elbow Rule Demo (No Libraries)

import random

# Step 1: Create 2D random points (simulate some data)
def generate_data(n_points=20, x_range=(0, 20), y_range=(0, 20)):
    return [(random.randint(*x_range), random.randint(*y_range)) for _ in range(n_points)]

# Step 2: Distance formula
def distance(p1, p2):
    return ((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2) ** 0.5

# Step 3: Assign each point to the closest centroid
def assign_clusters(points, centroids):
    clusters = {i: [] for i in range(len(centroids))}
    for point in points:
        distances = [distance(point, c) for c in centroids]
        closest = distances.index(min(distances))
        clusters[closest].append(point)
    return clusters

# Step 4: Update centroids to average positions
def update_centroids(clusters):
    new_centroids = []
    for group in clusters.values():
        if not group: continue
        avg_x = sum(p[0] for p in group) / len(group)
        avg_y = sum(p[1] for p in group) / len(group)
        new_centroids.append((avg_x, avg_y))
    return new_centroids

# Step 5: Calculate WCSS (total distance from points to their centroid)
def calculate_wcss(clusters, centroids):
    wcss = 0
    for idx, points in clusters.items():
        for p in points:
            wcss += distance(p, centroids[idx]) ** 2
    return wcss

# Step 6: Run K-means manually and simulate elbow rule
def elbow_method(points, max_k=5):
    print("\nElbow Method Results:\n")
    for k in range(1, max_k + 1):
        # Step A: Randomly pick k initial centroids
        centroids = random.sample(points, k)
        for _ in range(5):  # Run 5 iterations of updating
            clusters = assign_clusters(points, centroids)
            centroids = update_centroids(clusters)

        # Step B: Calculate and print WCSS
        wcss = calculate_wcss(clusters, centroids)
        print(f"K = {k} => WCSS = {round(wcss, 2)}")

# Run the demo
points = generate_data()
elbow_method(points)

What we’ll Get:

A simple printout like:

Elbow Method Results:

K = 1 => WCSS = 1298.23
K = 2 => WCSS = 740.11
K = 3 => WCSS = 450.67
K = 4 => WCSS = 440.12
K = 5 => WCSS = 438.05

We’ll see the WCSS dropping fast, then flattening. The “elbow” is where the drop starts slowing down — like at K = 3 above.

4. What is WCSS?

WCSS stands for: Within-Cluster Sum of Squares

Imagine This:

We’ve grouped a bunch of data points into clusters (like groups of toys, or students).For each group, we have a center point (the centroid — remember, the “middle” of the group).

Now, look at how far each point in the group is from the center.

  • If all the points are close to the center, the group is tight and clean.
  • If the points are far from the center, the group is loose and messy.

What Does WCSS Do?

For each point, it calculates:

“Distance from point to its group center (centroid)”

Then it:

1. Squares that distance (so everything is positive and big distances count more)
2. Adds up all these squared distances

This gives us the WCSS.

Simple Example (1D):

Let’s say we have a group with 3 points:
[2, 4, 6]
The centroid (average) = 4

Now:

  • Distance from 2 → 4 = 2 → 2² = 4
  • Distance from 4 → 4 = 0 → 0² = 0
  • Distance from 6 → 4 = 2 → 2² = 4
  • So the WCSS = 4 + 0 + 4 = 8

Why It Matters:

WCSS tells us how good our clusters are.

  • Small WCSS = points are close to their centers = tight groups = good!
  • Big WCSS = points are scattered = messy groups = bad

In Clustering (like K-Means):

  • We want to minimize WCSS
  • So we keep adjusting clusters until the total WCSS is as low as possible
  • In the Elbow Method, we see how WCSS changes as we try different numbers of groups

In Simple Words:

WCSS is a number that tells us how compact our groups are. Lower WCSS = better grouping.

5. Summary Table of Basic Math Concepts :

Math Concept Why It’s Needed
Mean (Average) To calculate centroids (group centers)
Distance Formula To assign points to nearest cluster
Squares & Roots For measuring WCSS (tightness of group)
X-Y Coordinates To visualize and group points
Arithmetic Basics For all calculations

6. Summary Table of Usecases :

Use Case What’s Grouped? Why It Helps
Customer Segmentation Buyer behavior Targeted marketing
Recommendations Viewing/listening patterns Personalized content
Fraud Detection Transactions Spotting unusual activity
Genetic Research DNA patterns Discovering hidden traits
Urban Planning City data Smarter development
Image Segmentation Pixel colors Compression or photo editing
Education Personalization Student progress Tailored support and content

Unsupervised Learning – Summary