I'm thinking of implementing "Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation" paper. In this paper authors used some custom object detector for entity detection(eg: Key, rope, ladder, etc) but they did not give any information about this custom detector. Can you please give me a suggestion on how to Implement this object detector?
-
2The authors of the paper did release the code for the paper on github. You can check it for reference: https://github.com/mrkulk/hierarchical-deep-RL/tree/master/dqn – João Schapke Jun 12 '21 at 21:00
-
@JoãoSchapke I had a scan through the code for the custom object detector, but could not find it. I suspect from the inclusion of template images that it is based on template match, which would indeed be very simple as I suggest in my answer. I'd like to extend my answer to confirm this, but sadly not able to. – Neil Slater Jun 13 '21 at 21:30
1 Answers
The quote from the paper is:
In this work, we built a custom object detector that provides plausible object candidates.
And in their related submisstion to NeurIPS:
In this work, we built a custom pipeline to provide plausible object candidates. Note that the agent is still required to learn which of these candidates are worth pursuing as goals.
I think that this detects and identifies "screen locations of interest" to create parameterised sub-goals - e.g. one goal might be to make objects A and B coincide. It is also not clear whether static objects such as rope and ladders are included, or the detector is more fine-tuned to "active" entities such as the player, keys, doors, enemies etc.
This also gives a clue:
The internal critic is defined in the space of
entity1, relation, entity2
, where relation is a function over configurations of the entities. In our experiments, the agent learns to choose entity2. For instance, the agent is deemed to have completed a goal (and only then receives a reward) if the agent entity reaches another entity such as the door
It implies that the entity detector does identify the player-controlled entity and sets this as entity1
for all the top level goals. All the high level goals are stated in terms of "player coincides with B" or "player avoids C".
Can you please give me a suggestion on how to Implement this object detector?
The implication is that they created something simple and fast that would specifically detect important objects in the game, so that it would find and identify all the objects that could reasonably be used in sub-goals. Objects in old Atari games are visually distinct and simple, so it is likely to be something quite basic. The reference code linked by João Schapke implies perhaps template match because there are suitable images in a "template" directory, but I was unable to find the object detector functions from scanning the code quickly.
It is also important to note that this custom detector was only used to build a selection of sub-goals and detect whether any had been achieved. It was not used as input to the neural networks (which had to learn their own representations of the same objects in order to make decisions), and did not set internal rewards for achieving those sub-goals. The top level policy selected amongst the sub-goals based on processing the video frames - this selection decided the lower level policy's reward. The algorithm used the detector only to assess whether a sub-goal had actually happened, at which point the reward for achieving it would be granted to the lower level policy. Whilst the upper level policy was granted rewards as scored by the original game.

- 28,678
- 3
- 38
- 60