For questions related to the concept of environment in reinforcement learning and other AI sub-fields.
Questions tagged [environment]
67 questions
16
votes
3 answers
Is the optimal policy always stochastic if the environment is also stochastic?
Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic?
Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…

nbro
- 39,006
- 12
- 98
- 176
12
votes
2 answers
Is there a fundamental difference between an environment being stochastic and being partially observable?
In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment.
I'm confused about this because what appears random can be described by hidden…

martinkunev
- 233
- 1
- 7
10
votes
3 answers
What do the different actions of the OpenAI gym's environment of 'Pong-v0' represent?
Printing action_space for Pong-v0 gives Discrete(6) as output, i.e. $0, 1, 2, 3, 4, 5$ are actions defined in the environment as per the documentation. However, the game needs only 2 controls. Why do we have this discrepancy? Further, is that…

cur10us
- 211
- 1
- 2
- 4
9
votes
1 answer
How does Q-learning work in stochastic environments?
The Q function uses the (current and future) states to determine the action that gets the highest reward.
However, in a stochastic environment, the current action (at the current state) does not determine the next state.
How does Q learning handle…

redlum
- 91
- 1
- 2
7
votes
1 answer
Are all fully observable environments episodic?
According to the definition of a fully observable environment in Russell & Norvig, AIMA (2nd ed), pages 41-44, an environment is only fully observable if it requires zero memory for an agent to perform optimally, that is, all relevant information is…

Francis M. Bacon
- 171
- 2
6
votes
3 answers
What exactly are partially observable environments?
I have trouble understanding the meaning of partially observable environments. Here's my doubt.
According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
6
votes
1 answer
Interesting examples of discrete stochastic games
SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…

user76284
- 347
- 1
- 14
6
votes
1 answer
Benchmarks for reinforcement learning in discrete MDPs
To compare the performance of various algorithms for perfect information games, reasonable benchmarks include reversi and m,n,k-games (generalized tic-tac-toe). For imperfect information games, something like simplified poker is a reasonable…

user76284
- 347
- 1
- 14
5
votes
1 answer
How to create a custom environment for reinforcement learning
I am a newbie in reinforcement learning working on a college project. The project is related to optimizing the hardware power. I am running proprietary software in Linux distribution (16.04). The goal is to use reinforcement learning and optimize…

NewToCoding
- 51
- 1
- 4
4
votes
2 answers
How can a neural network work with continuous time?
I have an ANN model that receives an input and produces an output. The output is an action that interacts with the environment and changes the input accordingly. The network has a desired environment state which, in any turn, decides the desired…

Emad
- 183
- 1
- 9
4
votes
1 answer
How should I generate datasets for a SARSA agent when the environment is not simple?
I am currently working on my master's thesis and going to apply Deep-SARSA as my DRL algorithm. The problem is that there is no datasets available and I guess that I should generate them somehow. Datasets generation seems a common feature in this…

Shahin
- 153
- 4
4
votes
2 answers
Why do all states appear identical under the function approximation in the Short Corridor task?
This is the Short Corridor problem taken from the Sutton & Barto book. Here it's written:
The problem is difficult because all the states appear identical under the function approximation
But this doesn't make much sense as we can always choose…

ZERO NULLS
- 147
- 8
4
votes
1 answer
What is the advantage of using more than one environment with the advantage actor-critic?
make_env = lambda: ptan.common.wrappers.wrap_dqn(gym.make("PongNoFrameskip-v4"))
envs = [make_env() for _ in range(NUM_ENVS)]
Here is a code you can look at.
The two above lines of code create multiple environments for the game of Atari Pong with…

jgauth
- 161
- 10
4
votes
1 answer
How to assign rewards in a non-Markovian environment?
I am quite new to the Reinforcement Learning domain and I am curious about something. It seems to be the case that the majority of current research assumes Markovian environments, that is, future states of the process depend only upon the present…

thulungair
- 43
- 3
4
votes
1 answer
How to represent players in a multi agent environment so each model can distinguish its own player
So I have 2 models trained with the DQN algorithm that I want to train in a multi-agent environment to see how they react with each other. The models were trained in an environment consisting of 0's and 1's (-1's for the other model)where 1 means…

Milky
- 41
- 2