I would like to implement the reinforcement learning algorithms SARSA and Q-Learning for the board game Connect Four.
I am familiar with the algorithms and know about their limitations regarding large states spaces due to storing all of this information in the Q-Table. Connect Four has a large state space estimated around 7.1*1013, e.g. MIT Slide 7, which is why simply applying these algorithms won't work (though this thesis claims it did)
However, I have found this answer to a similar post that proposes a possible solution to simplify the state space.
For one thing, you could simplify the problem by separating the action space. If you consider the value of each column separately, based on the two columns next to it, you reduce N to 3 and the state space size to 106. Now, this is very manageable. You can create an array to represent this value function and update it using a simple RL algorithm, such as SARSA
Unfortunately, I don't understand the proposed simplification and would like to ask the following questions:
- The action space is separated from the state space by considering each column separately. However, if my understanding of SARSA and QL is correct they use
Q(S,A)
to estimate the value function, therefore the state action pair is assigned a value. - How does one calculate the value of a column based on the two columns next to it?
- Also what does next to it mean in this context? If the two adjacent columns of each column are used then we create five pairs (N=5) or are pairs created from the inside out (e.g. middle three columns, middle five, all seven)?
- Is a state (of the entire board?) then mapped to the array containing the value function for each action/column?
Any references to other literature or simplifications would be much appreciated!