Highest Voted 'monte-carlo-methods' Questions - Artificial Intelligence Stack Exchange

20

votes

2 answers

What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?

I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally. So, what difference does the second point in both the slides refer to?

asked Feb 22 '19 at 09:28

user9947

17

votes

1 answer

How does "Monte-Carlo search" work?

I have heard about this concept in a Reddit post about AlphaGo. I have tried to go through the paper and the article, but could not really make sense of the algorithm. So, can someone give an easy-to-understand explanation of how the Monte-Carlo…

game-ai monte-carlo-tree-search monte-carlo-methods alphago

asked Aug 05 '16 at 07:03

Dawny33

1,371
13
29

9

votes

2 answers

What is the intuition behind TD($\lambda$)?

I'd like to better understand temporal-difference learning. In particular, I'm wondering if it is prudent to think about TD($\lambda$) as a type of "truncated" Monte Carlo learning?

reinforcement-learning comparison monte-carlo-methods temporal-difference-methods td-lambda

asked Jan 21 '20 at 22:17

Nick Kunz

145
1
5

8

votes

1 answer

How to fill in missing transitions when sampling an MDP transition table?

I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space. Suppose I am estimating the transition table for an MDP by running a large number of…

reinforcement-learning markov-decision-process monte-carlo-methods transition-model

asked Jan 29 '17 at 22:11

Brendan Hill

263
1
6

8

votes

1 answer

MCTS: How to choose the final action from the root

When the time allotted to Monte Carlo tree search runs out, what action should be chosen from the root? The original UCT paper (2006) says bestAction in their algorithm. Monte-Carlo Tree Search: A New Framework for Game AI (2008) says The game…

algorithm monte-carlo-tree-search monte-carlo-methods planning tree-search

asked Dec 03 '19 at 04:47

user76284

347
1
14

5

votes

1 answer

In MCTS, what to do if I do not want to simulate till the end of the game?

I'm trying to implement MCTS with UCT for a board game and I'm kinda stuck. The state space is quite large (3e15), and I'd like to compute a good move in less than 2 seconds. I already have MCTS implemented in Java from here, and I noticed that it…

monte-carlo-tree-search monte-carlo-methods upper-confidence-bound

asked Apr 05 '21 at 01:28

Sami

53
4

5

votes

1 answer

Why do we need importance sampling?

I was studying the off-policy policy improvement method. Then I encountered importance sampling. I completely understood the mathematics behind the calculation, but I am wondering what is the practical example of importance sampling. For instance,…

reinforcement-learning monte-carlo-methods off-policy-methods importance-sampling

asked Jan 04 '21 at 01:43

Alireza Hosseini

51
2

5

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

reinforcement-learning monte-carlo-methods temporal-difference-methods markov-property dynamic-programming

asked Aug 07 '20 at 05:19

stoic-santiago

1,121
5
18

5

votes

2 answers

How can we compute the ratio between the distributions if we don't know one of the distributions?

Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…

reinforcement-learning monte-carlo-methods importance-sampling

asked May 20 '20 at 21:48

pecey

313
2
9

4

votes

1 answer

Why is GLIE Monte-Carlo control an on-policy control?

In slide 16 of his lecture 5 of the course "Reinforcement Learning", David Silver introduced GLIE Monte-Carlo Control. But why is it an on-policy control? The sampling follows a policy $\pi$ while improvement follows an $\epsilon$-greedy policy, so…

reinforcement-learning control-problem on-policy-methods monte-carlo-methods

asked May 22 '18 at 07:57

fish_tree

247
1
6

4

votes

2 answers

Why is the target called "target" in Monte Carlo and TD learning if it is not the true target?

I was going through Sutton's book and, using sample-based learning for estimating the expectations, we have this formula $$ \text{new estimate} = \text{old estimate} + \alpha(\text{target} - \text{old estimate}) $$ What I don't quite understand is…

machine-learning reinforcement-learning terminology monte-carlo-methods temporal-difference-methods

asked Aug 28 '20 at 15:19

Chukwudi Ogbonna

125
4

4

votes

1 answer

Why are state-values alone not sufficient in determining a policy (without a model)?

"If a model is not available, then it is particularly useful to estimate action values (the values of state-action pairs) rather than state values. With a model, state values alone are sufficient to determine a policy; one simply looks ahead one…

reinforcement-learning monte-carlo-methods model-free-methods

asked Aug 07 '20 at 03:57

stoic-santiago

1,121
5
18

4

votes

1 answer

What does the term $|\mathcal{A}(s)|$ mean in the $\epsilon$-greedy policy?

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation: $$\frac{\varepsilon}{|\mathcal{A}(s)|}…

reinforcement-learning monte-carlo-methods notation on-policy-methods epsilon-greedy-policy

asked Jul 14 '20 at 20:11

Metrician

95
5

4

votes

1 answer

How does policy evaluation work for continuous state space model-free approaches?

How does policy evaluation work for continuous state space model-free approaches? Theoretically, a model-based approach for the discrete state and action space can be computed via dynamic programming and solving the Bellman equation. Let's say you…

reinforcement-learning deep-rl monte-carlo-methods model-free-methods policy-evaluation

asked Feb 19 '20 at 02:26

calveeen

1,251
7
17

4

votes

1 answer

How does Monte Carlo have high variance?

I was going through David Silver's lecture on reinforcement learning (lecture 4). At 51:22 he says that Monte Carlo (MC) methods have high variance and zero bias. I understand the zero bias part. It is because it is using the true value of value…

reinforcement-learning monte-carlo-methods temporal-difference-methods bias-variance-tradeoff

asked Feb 03 '20 at 08:59

Bhuwan Bhatt

394
1
11

Questions tagged [monte-carlo-methods]