Highest Voted 'td3' Questions - Artificial Intelligence Stack Exchange

3

votes

0 answers

Optimal episode length in reinforcement learning

I have a custom environment for stock trading where an episode can be as long as 2000-3000 steps. I've run several experiments with td3 and sac algorithms, average reward per episode flattens after few episodes. I believe average reward per episode…

asked May 28 '21 at 15:48

Mika

331
1
8

1

vote

1 answer

Can action be dominated by state features in actor-critic algorithms?

I have a case where my state consists of relatively large number of features, e.g. 50, whereas my action size is 1. I wonder whether my state features dominate the action in my critic network. I believe in theory eventually it shouldn't matter but…

deep-rl actor-critic-methods soft-actor-critic td3

asked Mar 02 '23 at 21:35

Mika

331
1
8

1

vote

0 answers

If we have a working reward function, would adding another action have a significant effect on the agent performance if task remains the same?

If we have a working reward function, providing the desired behavior and optimal policy in a continuous action/state-space problem, would adding another action significantly affect the possible optimal policy? For example, assume you have an RL…

reinforcement-learning deep-rl reward-functions action-spaces td3

asked May 15 '22 at 12:57

Philori

11
1

0

votes

0 answers

What is the meaning about the $\alpha$ in TD3 algorithm

I am study the paper with TD3 algorithm. I am curious about the meaning of $\alpha$ while the paper prove that overestimation will be happened in a critical situation. The contents about mathematical proof is like ... $\exists \epsilon_1 \ s.t…

reinforcement-learning machine-learning math actor-critic-methods td3

asked Sep 15 '22 at 09:30

jackson

1
2

0

votes

0 answers

Training a RL agent using different data at each episode

I am training a RL agent whose state is composed of two numbers, ranging between 4 ~ 16 and 0 ~ 360. The action is continuous and between 0~90. In real life, the states can be any I am training a TD3 agent using the stable baselines library. In real…

reinforcement-learning deep-rl continuous-action-spaces td3 continuous-state-spaces

asked Jan 26 '22 at 11:12

Leibniz

69
4

0

votes

0 answers

Is it possible to use Softmax as an activation function for actor (policy) network in TD3 or SAC Reinforcement learning algorithms?

As I understand from literature, normally, the last activation in an actor (policy) network in TD3 and SAC algorithms is a Tanh function, which is scaled by a certain limit. My action vector is perfectly described as a vector, where all values are…

reinforcement-learning deep-rl activation-functions soft-actor-critic td3

asked Aug 09 '21 at 17:34

Bi0max

101
1

0

votes

0 answers

Which is the best RL algo for continuous states but discrete action spaces problem

I am trying to train an AI with an environment where the states are continuous but the actions are discrete, that means I can not apply DDPG or TD3. Can someone please help to let know what should be the best algorithm for discrete action spaces and…

ai-design deep-rl ddpg td3

asked Aug 06 '20 at 23:01

user2783767

121
2

-1

votes

1 answer

TD3 sticking to end values

I am using TD3 on a custom gym environment, but the problem is that the action values stick to the end. Sticking to the end values makes reward negative, to be positive it must find action values somewhere in the mid. But, the agent doesn't learn…

reinforcement-learning td3

asked Jul 06 '21 at 13:30

K_197

1
3

Questions tagged [td3]