Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?

Question

As I understand from literature, normally, the last activation in an actor (policy) network in TD3 and SAC algorithms is a Tanh function, which is scaled by a certain limit.

My action vector is perfectly described as a vector, where all values are between 0 and 1, and which should sum up to 1. This is perfect for a Softmax function. But these values are not probabilities of discrete actions. Each value in action vector should be a percentage from the whole portfolio to be invested in a certain stock.

But I cannot figure out, if it would be mathematically fine to use Softmax as an activation layer in TD3 or SAC?

You should check out discrete SAC. Original SAC was designed for continuous action spaces (the code was, at least — I believe theoretically the paper works with both but was motivated by continuous action spaces) but I believe they developed a discrete version. I don’t have the reference to hand but a Google search should turn up results. — David, Aug 09 '21 at 23:12
@DavidIreland, my action space is not discrete actually. I did not imply probabilities by these values. Each value in action vector is actually a percentage from the whole portfolio to be invested in a certain stock. — Bi0max, Aug 10 '21 at 07:53
Ah, okay, I apologise. In which case then you can just use SAC with a soft max activation. I believe tanh is usually used because they have symmetric intervals for the action space, it is not something it relies on theoretically. — David, Aug 10 '21 at 12:31
@DavidIreland, thank you for the answer. I also assumed so, but I was not sure whether it's mathematically fine or not. — Bi0max, Aug 11 '21 at 13:08

Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?

0 Answers0