Questions tagged [continuous-tasks]

For questions that involve continuous/continuing tasks/problems/environments in reinforcement learning.

6 questions
6
votes
1 answer

What are the advantages of RL with actor-critic methods over actor-only methods?

In general, what are the advantages of RL with actor-critic methods over actor-only (or policy-based) methods? This is not a comparison with the Q-learning series, but probably a method of learning the game with only the actor. I think it's…
2
votes
4 answers

How can the Cart Pole problem be a continuing task?

In Introduction to Reinforcement Learning (2nd edition) by Sutton and Barto, there is an example of the Pole-Balancing problem (Example 3.4). In this example, they write that this problem can be treated as an episodic task or continuing task. I…
2
votes
0 answers

Why are agents trained in episodes, even in non-episodic tasks?

Let's consider some non-episodic problem. Maybe a game which can go on forever. My question is: Why are agents still trained in episodes? My understanding is that the agent's neural network is updated in batches depending on the batch size (so every…
2
votes
1 answer

For continuing tasks, is the choice of episode length completely arbitrary?

Let's say I'm training a reinforcement learning agent to act in some environment that perpetually continues to give the agent opportunities to earn rewards, and there is no cap on the score and there is no way to "win". That is, there is no natural…
1
vote
0 answers

Knowing the futility of discounting in continuing problems, how can we say discounting has no role in control problems with function approximation?

Sutton-Barto (Section 10.4, page 254): Based on the futility of discounting in continuing problems, how can we conclude that discounting has no role to play in control problems with function approximation?
1
vote
1 answer

Predicting continous value with CNN (prediction of fruit maturity)

I want to train some IA algorithm to be able to evaluate the maturity of a fruit (say, measured in numbers of days before rotten) based on an image of the fruit. My first instinct is to go with convolutional neural network (CNN), since those have…