Summary – Semi-Supervised Learning

Step 1: Counting and Comparing

Goal: Learn to count how many items (like words) match.

  • What to Practice:
    • Count items in two groups.
    • Compare if two words/sentences share the same items.
  • Real-World Example:
  • “I have 3 apples, you have 2 apples. How many do we both have in common?”

Step 2: Understanding Sets (Groups of Words)

Goal: Learn intersection and union of word sets.

  • Key Terms:
    • Intersection: What’s common in both?
    • Union: All unique items from both.
  • Practice:
  • Word set A = {hi, there}
    Word set B = {there, you}

    • Intersection = {there}
    • Union = {hi, there, you}

Step 3: Simple Division and Ratios

Goal: Learn how to compare parts vs whole.

  • Concept:
  • If 2 words match out of 5 total, that’s a confidence score:
    2 ÷ 5 = 0.4 (40% confident)

  • Practice:
    • Fractions (e.g., 3/4, 2/5)
    • Turn fractions into percentages (e.g., 0.4 → 40%)

Step 4: Text Cleaning and Matching

Goal: Learn how to simplify words (like “running” → “run”).

  • What to Learn:
    • Remove endings like ing, ed, es, s
    • Treat similar words as same root word (basic stemming)
  • Why It Matters: Helps match more words across sentences.

Step 5: Confidence Scoring

Goal: Use previous math to decide how sure we are.

  • How to Calculate:
    • Confidence = Common Words / Total Unique Words
    • Higher score = better match!

Step 6: Label Guessing (Semi-Supervised Learning)

Goal: Learn how the model guesses labels from a few examples.

  • How It Works:
    • Compare new (unlabeled) sentence to known (labeled) ones.
    • Use similarity (confidence) to pick the best matching label.
    • Add that guessed label to the data.
  • Example Flow:
    • Known: “Hi” → Greeting
    • Unknown: “Hello there” → similar → Guess = Greeting

Learning Path Summary Table

Step Skill Math Concept Learning Outcome
1 Count words Basic counting Compare text parts
2 Group words Sets (Union, Intersection) Find overlaps
3 Score match Division / Ratios Confidence scores
4 Normalize words Simple rules Better matching
5 Rate match Fraction → Percent How sure are we?
6 Label guessing Decision logic Apply semi-supervised learning

Semi-supervised Learning – Visual Roadmap