Activation Function example with Simple Python
Basic Python Code (No Libraries)
# Define activation functions def relu(x): return x if x > 0 else 0 def sigmoid(x): return 1 / (1 + pow(2.71828, -x)) # approximate e def tanh(x): e_pos = pow(2.71828, x) e_neg = pow(2.71828, -x) return (e_pos - e_neg) / (e_pos + e_neg) # Sample inputs inputs = [-2, -1, 0, 1, 2] # Show how each activation behaves print("Input\tReLU\tSigmoid\t\tTanh") print("---------------------------------------") for x in inputs: r = relu(x) s = sigmoid(x) t = tanh(x) print(f"{x}\t{r:.2f}\t{s:.4f}\t\t{t:.4f}")
Expected Output
Input ReLU Sigmoid Tanh
—————————————
-2 0.00 0.1192 -0.9640
-1 0.00 0.2689 -0.7616
0 0.00 0.5000 0.0000
1 1.00 0.7311 0.7616
2 2.00 0.8808 0.9640
Explanation:
- ReLU gives zero for all negatives, passes positives as-is.
- Sigmoid smoothly squashes values between 0 and 1.
- Tanh squashes values between -1 and 1, symmetric around zero.
when and why to use each activation function — with real-life-inspired use cases and their strengths & weaknesses.
2. Sigmoid
Use When:
- We need probability-like output (e.g., binary classification).
- We’re working on final layer of a network to decide “yes/no”, “true/false”.
Real Use Case Example:
Email Spam Detection:
If our model should say “Spam” or “Not Spam” — Sigmoid is perfect. A result close to 1 = Spam, close to 0 = Not Spam.
Caution:
- In deep networks, it can cause the vanishing gradient problem (slows learning).
- Use only in output layer if classification is binary.
3. Tanh (Hyperbolic Tangent)
Use When:
- Our data is centered around zero (positive & negative values).
- We need smoother gradients than ReLU.
- We’re dealing with hidden layers in a relatively shallow network.
Real Use Case Example:
Sentiment Analysis:
If input text has both positive and negative sentiments (e.g., movie reviews), tanh is useful to express that full range — from strongly negative (-1) to strongly positive (+1).
Why?
- Tanh gives balanced output: negative to positive.
- Helps capture subtle opposites, unlike sigmoid which is always positive.
Quick Comparison Table
Activation | Output Range | Best For | Commonly Used In | Weakness |
---|---|---|---|---|
ReLU | 0 to ∞ | Hidden layers | Image recognition, Deep Nets | Can “die” for negatives (output = 0) |
Sigmoid | 0 to 1 | Output layer (binary classification) | Logistic Regression, Spam detection | Vanishing gradients |
Tanh | -1 to 1 | Hidden layers (balanced input) | Sentiment analysis, Text signals | Still can suffer vanishing gradients |
Rule of Thumb
- Use ReLU for most hidden layers in deep networks.
- Use Sigmoid only in the output layer for binary classification.
- Use Tanh when your inputs are centered or involve positive and negative signals.
Activation Function relevancy in Neural Network – Visual Roadmap