Highest Voted 'environment' Questions - Artificial Intelligence Stack Exchange

16

votes

3 answers

Is the optimal policy always stochastic if the environment is also stochastic?

Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic? Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…

asked Feb 15 '19 at 13:20

nbro

39,006
12
98
176

12

votes

2 answers

Is there a fundamental difference between an environment being stochastic and being partially observable?

In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment. I'm confused about this because what appears random can be described by hidden…

comparison markov-decision-process environment pomdp

asked Dec 24 '21 at 01:31

martinkunev

233
1
7

10

votes

3 answers

What do the different actions of the OpenAI gym's environment of 'Pong-v0' represent?

Printing action_space for Pong-v0 gives Discrete(6) as output, i.e. $0, 1, 2, 3, 4, 5$ are actions defined in the environment as per the documentation. However, the game needs only 2 controls. Why do we have this discrepancy? Further, is that…

reinforcement-learning environment gym action-spaces

asked Dec 10 '16 at 13:44

cur10us

211
1
2
4

9

votes

1 answer

How does Q-learning work in stochastic environments?

The Q function uses the (current and future) states to determine the action that gets the highest reward. However, in a stochastic environment, the current action (at the current state) does not determine the next state. How does Q learning handle…

reinforcement-learning q-learning environment

asked Mar 29 '18 at 09:57

redlum

91
1
2

7

votes

1 answer

Are all fully observable environments episodic?

According to the definition of a fully observable environment in Russell & Norvig, AIMA (2nd ed), pages 41-44, an environment is only fully observable if it requires zero memory for an agent to perform optimally, that is, all relevant information is…

definitions environment norvig-russell

asked Mar 11 '18 at 07:58

Francis M. Bacon

171
2

6

votes

3 answers

What exactly are partially observable environments?

I have trouble understanding the meaning of partially observable environments. Here's my doubt. According to what I understand, the state of the environment is what precisely determines the next state and reward for any particular action taken. So,…

reinforcement-learning definitions environment state-spaces pomdp

asked May 22 '21 at 07:39

CHANDRASEKHAR HETHA HAVYA

63
5

6

votes

1 answer

Interesting examples of discrete stochastic games

SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…

game-theory environment markov-decision-process benchmarks

asked Dec 23 '19 at 20:12

user76284

347
1
14

6

votes

1 answer

Benchmarks for reinforcement learning in discrete MDPs

To compare the performance of various algorithms for perfect information games, reasonable benchmarks include reversi and m,n,k-games (generalized tic-tac-toe). For imperfect information games, something like simplified poker is a reasonable…

reinforcement-learning environment markov-decision-process benchmarks

asked Sep 01 '19 at 18:11

user76284

347
1
14

5

votes

1 answer

How to create a custom environment for reinforcement learning

I am a newbie in reinforcement learning working on a college project. The project is related to optimizing the hardware power. I am running proprietary software in Linux distribution (16.04). The goal is to use reinforcement learning and optimize…

reinforcement-learning environment

asked May 29 '19 at 08:49

NewToCoding

51
1
4

4

votes

2 answers

How can a neural network work with continuous time?

I have an ANN model that receives an input and produces an output. The output is an action that interacts with the environment and changes the input accordingly. The network has a desired environment state which, in any turn, decides the desired…

neural-networks machine-learning unsupervised-learning environment

asked Apr 08 '17 at 10:34

Emad

183
1
9

4

votes

1 answer

How should I generate datasets for a SARSA agent when the environment is not simple?

I am currently working on my master's thesis and going to apply Deep-SARSA as my DRL algorithm. The problem is that there is no datasets available and I guess that I should generate them somehow. Datasets generation seems a common feature in this…

reinforcement-learning datasets environment sarsa on-policy-methods

asked Jan 06 '21 at 07:26

Shahin

153
4

4

votes

2 answers

Why do all states appear identical under the function approximation in the Short Corridor task?

This is the Short Corridor problem taken from the Sutton & Barto book. Here it's written: The problem is difficult because all the states appear identical under the function approximation But this doesn't make much sense as we can always choose…

reinforcement-learning environment function-approximation sutton-barto

asked Oct 07 '20 at 02:21

ZERO NULLS

147
8

4

votes

1 answer

What is the advantage of using more than one environment with the advantage actor-critic?

make_env = lambda: ptan.common.wrappers.wrap_dqn(gym.make("PongNoFrameskip-v4")) envs = [make_env() for _ in range(NUM_ENVS)] Here is a code you can look at. The two above lines of code create multiple environments for the game of Atari Pong with…

reinforcement-learning actor-critic-methods environment advantage-actor-critic

asked May 03 '20 at 13:09

jgauth

161
10

4

votes

1 answer

How to assign rewards in a non-Markovian environment?

I am quite new to the Reinforcement Learning domain and I am curious about something. It seems to be the case that the majority of current research assumes Markovian environments, that is, future states of the process depend only upon the present…

reinforcement-learning rewards markov-decision-process environment markov-property

asked Oct 31 '19 at 16:15

thulungair

43
3

4

votes

1 answer

How to represent players in a multi agent environment so each model can distinguish its own player

So I have 2 models trained with the DQN algorithm that I want to train in a multi-agent environment to see how they react with each other. The models were trained in an environment consisting of 0's and 1's (-1's for the other model)where 1 means…

dqn multi-agent-systems environment

asked Jul 29 '19 at 17:50

Milky

41
2

Questions tagged [environment]