Highest Voted Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?

I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition) Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…

reinforcement-learning rewards sutton-barto expectation transition-model

asked Jun 05 '20 at 12:58

tmaric

382
2
8

5

votes

1 answer

How do I convert an MDP with the reward function in the form $R(s,a,s')$ to and an MDP with a reward function in the form $R(s,a)$?

The AIMA book has an exercise about showing that an MDP with rewards of the form $r(s, a, s')$ can be converted to an MDP with rewards $r(s, a)$, and to an MDP with rewards $r(s)$ with equivalent optimal policies. In the case of converting to $r(s)$…

reinforcement-learning markov-decision-process proofs reward-functions

asked May 25 '20 at 11:19

Asher

436
3
8

5

votes

2 answers

How can we compute the ratio between the distributions if we don't know one of the distributions?

Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…

reinforcement-learning monte-carlo-methods importance-sampling

asked May 20 '20 at 21:48

pecey

313
2
9

5

votes

1 answer

How does being on-policy prevent us from using the replay buffer with the policy gradients?

One of the approaches to improving the stability of the Policy Gradient family of methods is to use multiple environments in parallel. The reason behind this is the fundamental problem we discussed in Chapter 6, Deep Q-Network, when we talked about…

reinforcement-learning policy-gradients actor-critic-methods experience-replay a3c

asked May 12 '20 at 15:17

jgauth

161
10

5

votes

1 answer

What are the techniques for detecting and preventing overfitting?

I'm worrying that my neural network has become too complex. I don't want to end up with half of the neural network doing nothing but just take up space and resources. So, what are the techniques for detecting and preventing overfitting, to avoid…

reference-request optimization deep-neural-networks overfitting generalization

asked Aug 02 '16 at 15:55

kenorb

10,423
3
43
91

5

votes

2 answers

What are the common pitfalls that we could face when training neural networks?

Apart from the vanishing or exploding gradient problems, what are other problems or pitfalls that we could face when training neural networks?

neural-networks vanishing-gradient-problem exploding-gradient-problem

asked May 04 '20 at 14:29

pjoter

51
1

5

votes

1 answer

What happens if the opponent doesn't play optimally in minimax?

I just read an article about the minimax algorithm. When you design the algorithm, you assume that your opponent is a perfect player, i.e. it plays optimally. Let's consider the game of chess. What happens if the opponent plays irrationally or…

reinforcement-learning minimax

asked May 04 '20 at 13:18

dato nefaridze

862
6
20

5

votes

2 answers

Are bandits considered an RL approach?

If a research paper uses multi-armed bandits (either in their standard or contextual form) to solve a particular task, can we say that they solved this task using a reinforcement learning approach? Or should we distinguish between the two and use…

reinforcement-learning terminology multi-armed-bandits contextual-bandits

asked May 02 '20 at 14:42

user5093249

722
4
8

5

votes

3 answers

Is it possible to separately evolve a part of the population?

In a classic example of a genetic algorithm, you would have a population and a certain amount of simulation time to evaluate it and breeding. Then proceed to the next generation. Is it possible, during the simulation process, to have an isolated and…

genetic-algorithms evolutionary-algorithms genetic-programming island-models

asked Oct 06 '16 at 10:26

mikerson

53
2

5

votes

1 answer

Loss function for choosing a subset of objects

I'm trying to train a neural net to choose a subset from some list of objects. The input is a list of objects $(a,b,c,d,e,f)$ and for each list of objects the label is a list composed of 0/1 - 1 for every object that is in the subset, for example…

neural-networks deep-learning objective-functions

asked Apr 21 '20 at 19:17

Gilad Deutsch

629
5
12

5

votes

6 answers

What jobs cannot be automatized by AI in the future?

AI is progressing drastically, and imagine they tell you you're fired because a robot will take your place. What are some jobs that can never be automated?

philosophy robots

asked Sep 30 '16 at 19:24

bmwalide

399
2
6

5

votes

2 answers

Is it possible to guide a reinforcement learning algorithm?

I have just started to study reinforcement learning and, as far as I understand, existing algorithms search for the optimal solution/policy, but do not allow the possibility for the programmer to suggest a way to find the solution (to guide their…

reinforcement-learning deep-rl supervised-learning efficiency active-learning

asked Apr 18 '20 at 12:42

Cristian M

249
2
6

5

votes

4 answers

How does size of the dataset depend on VC dimension of the hypothesis class?

This might be a little broad question, but I have been watching Caltech youtube videos on Machine Learning, and in this video prof. is trying to explain how we should interpret the VC dimension in terms of what it means in layman terms, and why do…

computational-learning-theory vc-dimension vc-theory sample-complexity hypothesis-class

asked Apr 16 '20 at 22:33

Stefan Radonjic

187
5

5

votes

1 answer

What is the calcium equivalent role in neural networks

I understand that neural networks model biological neurons. Each node in the network represents a neuron cell and the connections between nodes represent the connections between cells. As in nature, a neuron fires an electrical signal to connected…

philosophy unsupervised-learning neurons

asked Sep 27 '16 at 18:11

k rey

163
4

5

votes

1 answer

What are some resources for coding some artificial intelligence techniques in the context of games?

I know the most basic rudimentary theory on AI, and I want to delve into actual practical coding with AI and machine learning. I already know a decent bit of coding in C++ and I'm learning Python syntax now. I think I want to start implementing…

machine-learning game-ai reference-request chess

asked Apr 10 '20 at 09:46

Tarun

53
4

Most Popular