0

As I understand from literature, normally, the last activation in an actor (policy) network in TD3 and SAC algorithms is a Tanh function, which is scaled by a certain limit.

My action vector is perfectly described as a vector, where all values are between 0 and 1, and which should sum up to 1. This is perfect for a Softmax function. But these values are not probabilities of discrete actions. Each value in action vector should be a percentage from the whole portfolio to be invested in a certain stock.

But I cannot figure out, if it would be mathematically fine to use Softmax as an activation layer in TD3 or SAC?

Bi0max
  • 101
  • 1
  • You should check out discrete SAC. Original SAC was designed for continuous action spaces (the code was, at least — I believe theoretically the paper works with both but was motivated by continuous action spaces) but I believe they developed a discrete version. I don’t have the reference to hand but a Google search should turn up results. – David Aug 09 '21 at 23:12
  • @DavidIreland, my action space is not discrete actually. I did not imply probabilities by these values. Each value in action vector is actually a percentage from the whole portfolio to be invested in a certain stock. – Bi0max Aug 10 '21 at 07:53
  • Ah, okay, I apologise. In which case then you can just use SAC with a soft max activation. I believe tanh is usually used because they have symmetric intervals for the action space, it is not something it relies on theoretically. – David Aug 10 '21 at 12:31
  • 1
    @DavidIreland, thank you for the answer. I also assumed so, but I was not sure whether it's mathematically fine or not. – Bi0max Aug 11 '21 at 13:08
  • Yes, it should be fine :) – David Aug 11 '21 at 16:15

0 Answers0