2

I have created a game on an 8x8 grid and there are 4 pieces which can move essentially like checkers pieces (Forward left or Forward right only). I have implemented a DQN in order to pull this off.

Here is how I have mapped my moves:

self.actions = {"1fl": 0, "1fr": 1,"2fl": 2, 
  "2fr": 3,"3fl": 4, "3fr": 5,"4fl": 6, "4fr": 7}

essentially I assigned each move to an integer value from 0-7 (8 total moves).

My question is: During any given turn, not all 8 moves are valid, how do I make sure when model.predict(state) the resulting prediction will be a move that is valid? Here is how I am currently handling it.

def act(self, state, env):
    #get the allowed list of actions
    actions_allowed = env.allowed_actions_for_agent()

    #Do a random move if random # greater than epsilon
    if np.random.rand(0,1) <= self.epsilon: 
        return actions_allowed[random.randint(0, len(actions_allowed)-1)]

    #get the prediction from the model by passing the current game board
    act_values = self.model.predict(state)

    #Check to see if prediction is in list of valid moves, if so return it
    if np.argmax(act_values[0]) in actions_allowed:
        return np.argmax(act_values[0])

    #If prediction is not valid do a random move instead....
    else:
        if len(actions_allowed) > 0:
            return actions_allowed[random.randint(0,len(actions_allowed)-1)]

I feel like if the agent predicts a move, and if that move is not in the actions_allowed set I should punish the agent.

But because it doesn't pick a valid move I make it do a random one instead, but I think this a problem. Because its bad prediction may ultimately end up still winning the game since the random move may have a positive outcome. I am at a total loss. The agent trains....but it doesn't seem to learn.... I have been training it for over 100k games now, and it only seems to win 10% of it games.... ugh.

Other helpful information: - I am utilizing experience replay for the DQN which I have based on the code from here:

Here is where I build my model as well:

self.action_size = 8
LENGTH = 8
def build_model(self):
    #builds the NN for Deep-Q Model
    model = Sequential() #establishes a feed forward NN
    model.add(Dense(64,input_shape = (LENGTH,), activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(self.action_size, activation = 'linear'))
    model.compile(loss='mse', optimizer='Adam')
malioboro
  • 2,729
  • 3
  • 20
  • 46

1 Answers1

2

I think instead of:

if np.argmax(act_values[0]) in actions_allowed:
        return np.argmax(act_values[0])

you can use something like:

if np.argmax(actions_allowed_values[0]):
        return np.argmax(actions_allowed_values[0])

So you didn't random your act if the best action is not allowed, but you choose the best action that allowed. You can do that by checking for each action value, whether it is a valid action or not, then take the best value. Or you can change invalid action value to a very small number.

malioboro
  • 2,729
  • 3
  • 20
  • 46