Summary - Reinforcement Learning - Little Bits of Artificial Intelligence

Summary – Reinforcement Learning

Episode Walkthrough (1 Step Example)

A.Agent starts at position 0.

State: 0

Chooses action: ‘right’

Moves to next_state: 1

Reward: -0.1 (not the goal yet)

Current Q-value: q_table[0][‘right’] → 0.0

Best Q-value in next_state (1): max(q_table[1].values()) → 0.0

Let’s update the Q-value using the formula:

Q(state, action) = Q(state, action)
+ learning_rate * (reward + discount * max(Q(next_state))
– Q(state, action))

Plug in the numbers:

old_value = 0.0
reward = -0.1
learning_rate = 0.1
discount = 0.9
next_max = 0.0
new_value  = 0.0 + 0.1 * (-0.1 + 0.9 * 0.0 - 0.0)
           = 0.1 * (-0.1)
           = -0.01

So we update:

q_table[0][‘right’] = -0.01

B.Agent’s Next Move

State: 1
Chooses action ‘right’ → moves to 2 (the goal!)
Reward: +1
Q-table update:

old_value = 0.0
reward = 1
learning_rate = 0.1
discount = 0.9
next_max = max(q_table[2].values()) = 0.0
new_value  = 0.0 + 0.1 * (1 + 0.9 * 0 - 0.0)
           = 0.1 * (1.0)
           = 0.1

So we update:

q_table[1][‘right’] = 0.1

C. Final Q-Table After This Episode:

q_table = {
0: {‘left’: 0.0, ‘right’: -0.01},
1: {‘left’: 0.0, ‘right’: 0.1},
2: {‘left’: 0.0, ‘right’: 0.0},
}

D. Repeating Over Episodes

If the agent repeats this 100+ times, it will slowly learn:

At position 0, going right → eventually gets reward.
At position 1, going right → leads to the goal.
So it will build up Q-values reflecting the best actions.

Reinforcement Learning – Core Concepts in Reinforcement Learning