Augmented an Image with other data when training CNN

Question

In the typical RL/MDP framework, I have offline data of $(s,a,r,s')$ of expert Atari gameplay.

I'm looking to train a CNN to predict $r$ based on $(s, a)$.

The states are represented by a $4 \times 84 \times 84$ image of the Atari screen, where 4 represents 4 sequential frames, and $84 \times 84$ is the size of the image. The action is an integer from 0 to 3.

I'm not sure how best to merge these two inputs $(s, a)$ together. How should I incorporate the action into the CNN?

So, are you trying to create some kind of inverse RL algorithm? — nbro, Jan 02 '22 at 12:43
The conventional way is to have a network with output size of 1x4 (for the 4 actions), to predict the output corresponding to each actions. Since action space is small, this would be a good bet. If you really want to merge S and A, you might want to encode A as onehot and concatenate with 1D feature vectors from the CNN. — Sooryakiran Pallikulathil, Jan 02 '22 at 18:38
My question can be taken out of the RL context for simplicity. Imagine if I had just images (states in the RL example), and additional information, say integers (actions in the RL example), giving a slight hint as to how "good" these images are (rewards in the RL example). How would you add the "additional information" to the CNN? Currently, I just add it in the last linear layer as an additional feature. Are there any obvious pitfalls I am making? — Snowball, Jan 02 '22 at 21:52

Augmented an Image with other data when training CNN

0 Answers0