I am using Q-learning in julia language.
Because of the solver’s configuration, actions have to be defined as the whole action space and impossible actions have to be also considered. It means that I can't use a function that, given a state, returns all the possible actions. In order to solve it I am using a dummy state which is terminal and a bad reward.
When the agent tries to take impossible actions, what is the difference between using a dummy state which is terminal and remaining in the same state (+bad reward) until the end of the episode? Other possible solutions?
Specifically, how can I either avoid defining an impossible action or alternatively define it as an impossible action?