Questions tagged [environment]

For questions related to the concept of environment in reinforcement learning and other AI sub-fields.

67 questions
16
votes
3 answers

Is the optimal policy always stochastic if the environment is also stochastic?

Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic? Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…
12
votes
2 answers

Is there a fundamental difference between an environment being stochastic and being partially observable?

In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment. I'm confused about this because what appears random can be described by hidden…
10
votes
3 answers

What do the different actions of the OpenAI gym's environment of 'Pong-v0' represent?

Printing action_space for Pong-v0 gives Discrete(6) as output, i.e. $0, 1, 2, 3, 4, 5$ are actions defined in the environment as per the documentation. However, the game needs only 2 controls. Why do we have this discrepancy? Further, is that…
cur10us
  • 211
  • 1
  • 2
  • 4
9
votes
1 answer

How does Q-learning work in stochastic environments?

The Q function uses the (current and future) states to determine the action that gets the highest reward. However, in a stochastic environment, the current action (at the current state) does not determine the next state. How does Q learning handle…
redlum
  • 91
  • 1
  • 2
7
votes
1 answer

Are all fully observable environments episodic?

According to the definition of a fully observable environment in Russell & Norvig, AIMA (2nd ed), pages 41-44, an environment is only fully observable if it requires zero memory for an agent to perform optimally, that is, all relevant information is…
6
votes
3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…
6
votes
1 answer

Interesting examples of discrete stochastic games

SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…
6
votes
1 answer

Benchmarks for reinforcement learning in discrete MDPs

To compare the performance of various algorithms for perfect information games, reasonable benchmarks include reversi and m,n,k-games (generalized tic-tac-toe). For imperfect information games, something like simplified poker is a reasonable…
5
votes
1 answer

How to create a custom environment for reinforcement learning

I am a newbie in reinforcement learning working on a college project. The project is related to optimizing the hardware power. I am running proprietary software in Linux distribution (16.04). The goal is to use reinforcement learning and optimize…
NewToCoding
  • 51
  • 1
  • 4
4
votes
2 answers

How can a neural network work with continuous time?

I have an ANN model that receives an input and produces an output. The output is an action that interacts with the environment and changes the input accordingly. The network has a desired environment state which, in any turn, decides the desired…
4
votes
1 answer

How should I generate datasets for a SARSA agent when the environment is not simple?

I am currently working on my master's thesis and going to apply Deep-SARSA as my DRL algorithm. The problem is that there is no datasets available and I guess that I should generate them somehow. Datasets generation seems a common feature in this…
4
votes
2 answers

Why do all states appear identical under the function approximation in the Short Corridor task?

This is the Short Corridor problem taken from the Sutton & Barto book. Here it's written: The problem is difficult because all the states appear identical under the function approximation But this doesn't make much sense as we can always choose…
4
votes
1 answer

What is the advantage of using more than one environment with the advantage actor-critic?

make_env = lambda: ptan.common.wrappers.wrap_dqn(gym.make("PongNoFrameskip-v4")) envs = [make_env() for _ in range(NUM_ENVS)] Here is a code you can look at. The two above lines of code create multiple environments for the game of Atari Pong with…
4
votes
1 answer

How to assign rewards in a non-Markovian environment?

I am quite new to the Reinforcement Learning domain and I am curious about something. It seems to be the case that the majority of current research assumes Markovian environments, that is, future states of the process depend only upon the present…
4
votes
1 answer

How to represent players in a multi agent environment so each model can distinguish its own player

So I have 2 models trained with the DQN algorithm that I want to train in a multi-agent environment to see how they react with each other. The models were trained in an environment consisting of 0's and 1's (-1's for the other model)where 1 means…
Milky
  • 41
  • 2
1
2 3 4 5