For questions related to the Twin Delayed Deep Deterministic policy gradient algorithm (TD3).
Questions tagged [td3]
8 questions
3
votes
0 answers
Optimal episode length in reinforcement learning
I have a custom environment for stock trading where an episode can be as long as 2000-3000 steps. I've run several experiments with td3 and sac algorithms, average reward per episode flattens after few episodes. I believe average reward per episode…

Mika
- 331
- 1
- 8
1
vote
1 answer
Can action be dominated by state features in actor-critic algorithms?
I have a case where my state consists of relatively large number of features, e.g. 50, whereas my action size is 1. I wonder whether my state features dominate the action in my critic network. I believe in theory eventually it shouldn't matter but…

Mika
- 331
- 1
- 8
1
vote
0 answers
If we have a working reward function, would adding another action have a significant effect on the agent performance if task remains the same?
If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy?
For example, assume you have an RL…

Philori
- 11
- 1
0
votes
0 answers
What is the meaning about the $\alpha$ in TD3 algorithm
I am study the paper with TD3 algorithm.
I am curious about the meaning of $\alpha$ while the paper prove that overestimation will be happened in a critical situation.
The contents about mathematical proof is like ...
$\exists \epsilon_1 \ s.t…

jackson
- 1
- 2
0
votes
0 answers
Training a RL agent using different data at each episode
I am training a RL agent whose state is composed of two numbers, ranging between 4 ~ 16 and 0 ~ 360. The action is continuous and between 0~90. In real life, the states can be any I am training a TD3 agent using the stable baselines library. In real…

Leibniz
- 69
- 4
0
votes
0 answers
Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?
As I understand from literature, normally, the last activation in an actor (policy) network in TD3 and SAC algorithms is a Tanh function, which is scaled by a certain limit.
My action vector is perfectly described as a vector, where all values are…

Bi0max
- 101
- 1
0
votes
0 answers
Which is the best RL algo for continuous states but discrete action spaces problem
I am trying to train an AI with an environment where the states are continuous but the actions are discrete, that means I can not apply DDPG or TD3.
Can someone please help to let know what should be the best algorithm for discrete action spaces and…

user2783767
- 121
- 2
-1
votes
1 answer
TD3 sticking to end values
I am using TD3 on a custom gym environment, but the problem is that the action values stick to the end. Sticking to the end values makes reward negative, to be positive it must find action values somewhere in the mid. But, the agent doesn't learn…

K_197
- 1
- 3