I am attempting to implement an agent that learns to play in the Pong environment, the environment was created in PyGame and I return the pixel data and score at each frame. I use a CNN to take a stack of the last 4 frames as input and predicts the best action to take, I also make use of training on a minibatch of experiences from an experience replay at each timestep.
I have seen an implementation where the game returned a reward of 10 for each time the bot returns the ball and -10 for each time the bot misses the ball.
My question is whether it would be better to reward the bot significantly for managing to get the ball passed the opponent, ending the episode. I was thinking of rewarding 10 for winning the episode, -10 for missing the ball and 5 for returning the ball.
Please let me know if my approach is sensible, has any glaring problems or if I need to provide more information.
Thank you!