I have created a game on an 8x8 grid and there are 4 pieces which can move essentially like checkers pieces (Forward left or Forward right only). I have implemented a DQN in order to pull this off.
Here is how I have mapped my moves:
self.actions = {"1fl": 0, "1fr": 1,"2fl": 2,
"2fr": 3,"3fl": 4, "3fr": 5,"4fl": 6, "4fr": 7}
essentially I assigned each move to an integer value from 0-7 (8 total moves).
My question is: During any given turn, not all 8 moves are valid, how do I make sure when model.predict(state) the resulting prediction will be a move that is valid? Here is how I am currently handling it.
def act(self, state, env):
#get the allowed list of actions
actions_allowed = env.allowed_actions_for_agent()
#Do a random move if random # greater than epsilon
if np.random.rand(0,1) <= self.epsilon:
return actions_allowed[random.randint(0, len(actions_allowed)-1)]
#get the prediction from the model by passing the current game board
act_values = self.model.predict(state)
#Check to see if prediction is in list of valid moves, if so return it
if np.argmax(act_values[0]) in actions_allowed:
return np.argmax(act_values[0])
#If prediction is not valid do a random move instead....
else:
if len(actions_allowed) > 0:
return actions_allowed[random.randint(0,len(actions_allowed)-1)]
I feel like if the agent predicts a move, and if that move is not in the actions_allowed set I should punish the agent.
But because it doesn't pick a valid move I make it do a random one instead, but I think this a problem. Because its bad prediction may ultimately end up still winning the game since the random move may have a positive outcome. I am at a total loss. The agent trains....but it doesn't seem to learn.... I have been training it for over 100k games now, and it only seems to win 10% of it games.... ugh.
Other helpful information: - I am utilizing experience replay for the DQN which I have based on the code from here:
Here is where I build my model as well:
self.action_size = 8
LENGTH = 8
def build_model(self):
#builds the NN for Deep-Q Model
model = Sequential() #establishes a feed forward NN
model.add(Dense(64,input_shape = (LENGTH,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(self.action_size, activation = 'linear'))
model.compile(loss='mse', optimizer='Adam')