Activation Function example with Simple Python
Basic Python Code (No Libraries)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # Define activation functions def relu(x): return x if x > 0 else 0 def sigmoid(x): return 1 / ( 1 + pow ( 2.71828 , - x)) # approximate e def tanh(x): e_pos = pow ( 2.71828 , x) e_neg = pow ( 2.71828 , - x) return (e_pos - e_neg) / (e_pos + e_neg) # Sample inputs inputs = [ - 2 , - 1 , 0 , 1 , 2 ] # Show how each activation behaves print ( "Input\tReLU\tSigmoid\t\tTanh" ) print ( "---------------------------------------" ) for x in inputs: r = relu(x) s = sigmoid(x) t = tanh(x) print (f "{x}\t{r:.2f}\t{s:.4f}\t\t{t:.4f}" ) |
Expected Output
Input ReLU Sigmoid Tanh
—————————————
-2 0.00 0.1192 -0.9640
-1 0.00 0.2689 -0.7616
0 0.00 0.5000 0.0000
1 1.00 0.7311 0.7616
2 2.00 0.8808 0.9640
Explanation:
- ReLU gives zero for all negatives, passes positives as-is.
- Sigmoid smoothly squashes values between 0 and 1.
- Tanh squashes values between -1 and 1, symmetric around zero.
when and why to use each activation function — with real-life-inspired use cases and their strengths & weaknesses.
2. Sigmoid
Use When:
- We need probability-like output (e.g., binary classification).
- We’re working on final layer of a network to decide “yes/no”, “true/false”.
Real Use Case Example:
Email Spam Detection:
If our model should say “Spam” or “Not Spam” — Sigmoid is perfect. A result close to 1 = Spam, close to 0 = Not Spam.
Caution:
- In deep networks, it can cause the vanishing gradient problem (slows learning).
- Use only in output layer if classification is binary.
3. Tanh (Hyperbolic Tangent)
Use When:
- Our data is centered around zero (positive & negative values).
- We need smoother gradients than ReLU.
- We’re dealing with hidden layers in a relatively shallow network.
Real Use Case Example:
Sentiment Analysis:
If input text has both positive and negative sentiments (e.g., movie reviews), tanh is useful to express that full range — from strongly negative (-1) to strongly positive (+1).
Why?
- Tanh gives balanced output: negative to positive.
- Helps capture subtle opposites, unlike sigmoid which is always positive.
Quick Comparison Table
Activation | Output Range | Best For | Commonly Used In | Weakness |
---|---|---|---|---|
ReLU | 0 to ∞ | Hidden layers | Image recognition, Deep Nets | Can “die” for negatives (output = 0) |
Sigmoid | 0 to 1 | Output layer (binary classification) | Logistic Regression, Spam detection | Vanishing gradients |
Tanh | -1 to 1 | Hidden layers (balanced input) | Sentiment analysis, Text signals | Still can suffer vanishing gradients |
Rule of Thumb
- Use ReLU for most hidden layers in deep networks.
- Use Sigmoid only in the output layer for binary classification.
- Use Tanh when your inputs are centered or involve positive and negative signals.
Activation Function relevancy in Neural Network – Visual Roadmap