Activation Function example with Simple Python

Basic Python Code (No Libraries)

# Define activation functions
def relu(x):
    return x if x > 0 else 0

def sigmoid(x):
    return 1 / (1 + pow(2.71828, -x))  # approximate e

def tanh(x):
    e_pos = pow(2.71828, x)
    e_neg = pow(2.71828, -x)
    return (e_pos - e_neg) / (e_pos + e_neg)

# Sample inputs
inputs = [-2, -1, 0, 1, 2]

# Show how each activation behaves
print("Input\tReLU\tSigmoid\t\tTanh")
print("---------------------------------------")
for x in inputs:
    r = relu(x)
    s = sigmoid(x)
    t = tanh(x)
    print(f"{x}\t{r:.2f}\t{s:.4f}\t\t{t:.4f}")

Expected Output

Input ReLU Sigmoid Tanh
—————————————
-2 0.00 0.1192 -0.9640
-1 0.00 0.2689 -0.7616
0 0.00 0.5000 0.0000
1 1.00 0.7311 0.7616
2 2.00 0.8808 0.9640

Explanation:

ReLU gives zero for all negatives, passes positives as-is.
Sigmoid smoothly squashes values between 0 and 1.
Tanh squashes values between -1 and 1, symmetric around zero.

when and why to use each activation function — with real-life-inspired use cases and their strengths & weaknesses.

2. Sigmoid

Use When:

We need probability-like output (e.g., binary classification).
We’re working on final layer of a network to decide “yes/no”, “true/false”.

Real Use Case Example:

Email Spam Detection:
If our model should say “Spam” or “Not Spam” — Sigmoid is perfect. A result close to 1 = Spam, close to 0 = Not Spam.

Caution:

In deep networks, it can cause the vanishing gradient problem (slows learning).
Use only in output layer if classification is binary.

3. Tanh (Hyperbolic Tangent)

Use When:

Our data is centered around zero (positive & negative values).
We need smoother gradients than ReLU.
We’re dealing with hidden layers in a relatively shallow network.

Real Use Case Example:

Sentiment Analysis:

If input text has both positive and negative sentiments (e.g., movie reviews), tanh is useful to express that full range — from strongly negative (-1) to strongly positive (+1).

Why?

Tanh gives balanced output: negative to positive.
Helps capture subtle opposites, unlike sigmoid which is always positive.

Quick Comparison Table

Activation	Output Range	Best For	Commonly Used In	Weakness
ReLU	0 to ∞	Hidden layers	Image recognition, Deep Nets	Can “die” for negatives (output = 0)
Sigmoid	0 to 1	Output layer (binary classification)	Logistic Regression, Spam detection	Vanishing gradients
Tanh	-1 to 1	Hidden layers (balanced input)	Sentiment analysis, Text signals	Still can suffer vanishing gradients

Rule of Thumb

Use ReLU for most hidden layers in deep networks.
Use Sigmoid only in the output layer for binary classification.
Use Tanh when your inputs are centered or involve positive and negative signals.

Activation Function relevancy in Neural Network – Visual Roadmap