Brainstorming Session - Reinforcement Learning - Little Bits of Artificial Intelligence

Brainstorming Session – Reinforcement Learning

1.What is a Q-Table?

A Q-table is just a big table (like a spreadsheet) that helps the agent remember:
“If I’m in this situation (state), and I do this action, how good is that?”
It tells the agent how much reward it can expect if it takes a certain action in a certain state.

Think of it like a cheat sheet:

State	Action	Q-Value (Expected Reward)
At position 0	move right	0.6
At position 0	move left	-0.3
At position 1	move right	0.9
At position 1	move left	-0.2
…	…	…

The higher the Q-value, the better the action is in that state.
In Code Terms

It’s usually written as a dictionary like this in Python:

q_table = {
    state1: {action1: value1, action2: value2},
    state2: {action1: value1, action2: value2},
}

Each state is a key, and the value is another dictionary with actions and their Q-values.

2. How is the Q-table updated?

Every time the agent takes an action, it updates the Q-table using this formula:

Q(state, action) = Q(state, action) 
                 + learning_rate * (reward + discount * max(Q(next_state)) - Q(state, action))

Let’s break that down:

Term	Meaning
Q(state, action)	Current guess of how good this action is in this state
reward	What we actually got
max(Q(next_state))	What we expect to get from the best next move
learning_rate	How much we want to adjust the guess
discount	How much future rewards matter compared to now

So over time, the Q-values get smarter and closer to the truth.

3. Real Life Analogy

Let’s say we’re learning which vending machine buttons give snacks:

State = our current hunger level
Action = pressing a button
Reward = tasty snack or nothing
Q-table = our memory of which button gave which snack when we were hungry.

We keep updating your “Q-table” in our brain based on which actions gave us the best results.

4. A small,real example line by line of how a Q-table is updated in a reinforcement learning setting.

Simple Game Setup

We’ll keep it very small:

Positions: [0, 1, 2]
Goal: reach position 2
Actions: ‘left’ or ‘right’
Rewards:

+1 if the agent reaches position 2

-0.1 for any other move

Initial Q-Table

q_table = {
0: {‘left’: 0.0, ‘right’: 0.0},
1: {‘left’: 0.0, ‘right’: 0.0},
2: {‘left’: 0.0, ‘right’: 0.0},
}

Each position (state) has two possible actions, and we start by assuming they’re all equally good (0.0).

Reinforcement Learning – Summary