Questions tagged [continuous-state-spaces]

For questions about continuous state spaces, in the context of reinforcement learning or other AI sub-fields.

12 questions
18
votes
2 answers

Can Q-learning be used for continuous (state or action) spaces?

Many examples work with a table-based method for Q-learning. This may be suitable for a discrete state (observation) or action space, like a robot in a grid world, but is there a way to use Q-learning for continuous spaces like the control of a…
2
votes
1 answer

Model-based RL algorithms for continuous state space and finite action space

At the beginning, if I have a complete model $p(s' \mid s, a)$ (an assumed true model that describes the environment well enough) and the reward function $r(s,a,s')$. How can I exploit the model and learn a good policy in this situation? Assume that…
2
votes
1 answer

Can neural networks have continuous inputs and outputs, or do they have to be discrete?

In general, can ANNs have continuous inputs and outputs, or do they have to be discrete? So, basically, I would like to have a mapping of continuous inputs to continuous outputs. Is this possible? Does this depend on the type of ANN? More…
1
vote
0 answers

How to generalize finite MDP to general MDP?

Suppose, for simplicity sake, to be in a discrete time domain with the action set being the same for all states $S \in \mathcal{S}$. Thus, in a finite Markov Decision Process, the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ have a finite…
1
vote
1 answer

Model-based learning in continuous state and action spaces

I am interested in learning how transition probabilities/mdps are constructed in continuous state and action space model-based learning setting. There is some literature available on this matter, but they do not explicitly construct the model to…
1
vote
1 answer

Variable observation space at each episode

I have an enviroment with continuous actions and state variables. Every time I reset my env, between 2 and 5 balls spawn randomly in a box of 100x100 size. One of those balls (the red one) will receive an action (direction of movement) and will move…
1
vote
1 answer

RL - Can RL be applied to problems where the next state is not the next observation?

I'm quite new on the study of reinforcement learning, and Im working on a communication problem with continuous large actions range for my final graduation work. I'm trying to use Gaussian Policy and Police Gradient methods for that implementation.…
1
vote
1 answer

What would be the Bellman optimality equation for $q_∗(s, a)$ for an MDP with continuous states and actions?

I'm currently studying Reinforcement Learning and I'd like to know what would be the Bellman optimality equation for action values $q_∗(s, a)$ for a MDP with continuous states and actions, written out using explicit integration (no expectation…
1
vote
1 answer

Reinforcement learning algorithms for large problems that are not based on a neural network

I have a large control problem with multidimensional continuous inputs (13) and outputs (3). I tried several Reinforcement learning algorithms like Deep-Q-Networks (DQN), Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C).…
1
vote
0 answers

Is there a gentle introduction to reinforcement learning applied to MDPs with continuous state spaces?

I am looking for a gentle introduction (videos, lecture notes, tutorials, books) on reinforcement learning (MDPs) involving continuous states (or very large cardinality of state space). In particular, I am looking for ways on how to deal with them,…
0
votes
0 answers

Training a RL agent using different data at each episode

I am training a RL agent whose state is composed of two numbers, ranging between 4 ~ 16 and 0 ~ 360. The action is continuous and between 0~90. In real life, the states can be any I am training a TD3 agent using the stable baselines library. In real…
0
votes
0 answers

What do we actually 'approximate' when dealing with large state spaces in Q-learning?

I realized that my state space is very large in size. I had planned to use tabular Q-learning (Bellman equation to update the $Q(s, a)$ after each action taken). But this 'large space' realization has now disappointed me and I read a lot of stuff on…