Summary – Semi-Supervised Learning
Step 1: Counting and Comparing
Goal: Learn to count how many items (like words) match.
- What to Practice:
- Count items in two groups.
- Compare if two words/sentences share the same items.
- Real-World Example:
“I have 3 apples, you have 2 apples. How many do we both have in common?”
Step 2: Understanding Sets (Groups of Words)
Goal: Learn intersection and union of word sets.
- Key Terms:
- Intersection: What’s common in both?
- Union: All unique items from both.
- Practice:
- Intersection = {there}
- Union = {hi, there, you}
Word set A = {hi, there}
Word set B = {there, you}
Step 3: Simple Division and Ratios
Goal: Learn how to compare parts vs whole.
- Concept:
- Practice:
- Fractions (e.g., 3/4, 2/5)
- Turn fractions into percentages (e.g., 0.4 → 40%)
If 2 words match out of 5 total, that’s a confidence score:
2 ÷ 5 = 0.4 (40% confident)
Step 4: Text Cleaning and Matching
Goal: Learn how to simplify words (like “running” → “run”).
- What to Learn:
- Remove endings like ing, ed, es, s
- Treat similar words as same root word (basic stemming)
- Why It Matters: Helps match more words across sentences.
Step 5: Confidence Scoring
Goal: Use previous math to decide how sure we are.
- How to Calculate:
- Confidence = Common Words / Total Unique Words
- Higher score = better match!
Step 6: Label Guessing (Semi-Supervised Learning)
Goal: Learn how the model guesses labels from a few examples.
- How It Works:
- Compare new (unlabeled) sentence to known (labeled) ones.
- Use similarity (confidence) to pick the best matching label.
- Add that guessed label to the data.
- Example Flow:
- Known: “Hi” → Greeting
- Unknown: “Hello there” → similar → Guess = Greeting
Learning Path Summary Table
Step | Skill | Math Concept | Learning Outcome |
---|---|---|---|
1 | Count words | Basic counting | Compare text parts |
2 | Group words | Sets (Union, Intersection) | Find overlaps |
3 | Score match | Division / Ratios | Confidence scores |
4 | Normalize words | Simple rules | Better matching |
5 | Rate match | Fraction → Percent | How sure are we? |
6 | Label guessing | Decision logic | Apply semi-supervised learning |
Semi-supervised Learning – Visual Roadmap