3

In my understanding, DQN is useful because it utilises a neural network as a q-value function approximator, which, after the training, can generalise to unseen states.

I understand how that would work when the input is a vector of continuous values, however, I don't understand why DQN would be used with discrete state-spaces. If the input to the neural network is just an integer with no clear structure, how is this supposed to generalise?

If, instead of feeding to the network just an integer, we fed a vector of integers, in which each element represents a characteristic of the state (separating things like speed, position, etc.) instead of collapsing everything in a single integer, would that generalise better?

nbro
  • 39,006
  • 12
  • 98
  • 176
Redox
  • 33
  • 2

1 Answers1

1

An environment is said to have a discrete state-space, when the number of all possible states of the environment is finite. For example, $3\times3$ Tic-tac-toe game has a discrete state-space, since there are 9 cells on the board and only so many different ways to arrange Os and Xs.

A state-space can be discrete regardless of whether integers or non-integers are used to describe it. For example, consider an environment where a state is represented with a single number. If the set of all possible states is $ \{0, 0.3, 0.5, 1\}$, your state-space is discrete, because there are only $4$ states. However, if all possible states is the set of real numbers from $0$ to $1$, than it's not discrete anymore - because there are infinitely many of them. State-space can still be discrete, even if possible states are represented with multiple numbers. For example, our environment could be $10\times10\times10$ cube, where the agent is only allowed to stand on integer coordinates. In this scenario, there are $1000$ different places where the agent can be and hence the state-space is discrete.

The Deep Q-Network can be designed to accept any type of input; just like a regular ANN, it's not restricted to only one integer. An input to DQN is the state of the environment, regardless of how it's represented. For the previous example, you can setup the DQN to have an input layer with $3$ neurons, each one accepting an integer that describes agent's position along the $x, y, z$ axes.

One downfall of Q-learning is that when the environment has a very large number of states and actions, representing each state-action pair becomes impractical in terms of memory. Think of chess, where there are so many different possible positions, and multiple available moves in each of them. Moreover, for an agent to learn properly, every state-action pair value must be visited (agent needs to determine Q-values), which can impractical in terms of training time.

DQN algorithm takes care of these problems: it only needs to store the neural network (also few other things if you use a variation of DQN), and it doesn't need to visit every state-action pair to learn. They way it learns is by adjusting the weights and biases in the network in order to approximate the optimal policy. Given the algorithm is implemented correctly, the agent should be able to pick up some useful patterns (or solve the environment).

I used this paper as a reference for one of my projects. It implements the DQN algorithm to learn to play Sungka (a game similar to Mancala), which has finite number of possible states and actions.

mark mark
  • 753
  • 4
  • 23