Most Popular
1500 questions
5
votes
1 answer
If the current state is $S_t$ and the actions are chosen according to $\pi$, what is the expectation of $R_{t+1}$ in terms of $\pi$ and $p$?
I'm trying to solve exercise 3.11 from the book Sutton and Barto's book (2nd edition)
Exercise 3.11 If the current state is $S_t$ , and actions are selected according to a stochastic policy $\pi$, then what is the expectation of $R_{t+1}$ in terms…

tmaric
- 382
- 2
- 8
5
votes
1 answer
How do I convert an MDP with the reward function in the form $R(s,a,s')$ to and an MDP with a reward function in the form $R(s,a)$?
The AIMA book has an exercise about showing that an MDP with rewards of the form $r(s, a, s')$ can be converted to an MDP with rewards $r(s, a)$, and to an MDP with rewards $r(s)$ with equivalent optimal policies.
In the case of converting to $r(s)$…

Asher
- 436
- 3
- 8
5
votes
2 answers
How can we compute the ratio between the distributions if we don't know one of the distributions?
Here is my understanding of importance sampling. If we have two distributions $p(x)$ and $q(x)$, where we have a way of sampling from $p(x)$ but not from $q(x)$, but we want to compute the expectation wrt $q(x)$, then we use importance sampling.…

pecey
- 313
- 2
- 9
5
votes
1 answer
How does being on-policy prevent us from using the replay buffer with the policy gradients?
One of the approaches to improving the stability of the Policy
Gradient family of methods is to use multiple environments in
parallel. The reason behind this is the fundamental problem we
discussed in Chapter 6, Deep Q-Network, when we talked about…

jgauth
- 161
- 10
5
votes
1 answer
What are the techniques for detecting and preventing overfitting?
I'm worrying that my neural network has become too complex. I don't want to end up with half of the neural network doing nothing but just take up space and resources.
So, what are the techniques for detecting and preventing overfitting, to avoid…

kenorb
- 10,423
- 3
- 43
- 91
5
votes
2 answers
What are the common pitfalls that we could face when training neural networks?
Apart from the vanishing or exploding gradient problems, what are other problems or pitfalls that we could face when training neural networks?

pjoter
- 51
- 1
5
votes
1 answer
What happens if the opponent doesn't play optimally in minimax?
I just read an article about the minimax algorithm. When you design the algorithm, you assume that your opponent is a perfect player, i.e. it plays optimally.
Let's consider the game of chess. What happens if the opponent plays irrationally or…

dato nefaridze
- 862
- 6
- 20
5
votes
2 answers
Are bandits considered an RL approach?
If a research paper uses multi-armed bandits (either in their standard or contextual form) to solve a particular task, can we say that they solved this task using a reinforcement learning approach? Or should we distinguish between the two and use…

user5093249
- 722
- 4
- 8
5
votes
3 answers
Is it possible to separately evolve a part of the population?
In a classic example of a genetic algorithm, you would have a population and a certain amount of simulation time to evaluate it and breeding. Then proceed to the next generation.
Is it possible, during the simulation process, to have an isolated and…

mikerson
- 53
- 2
5
votes
1 answer
Loss function for choosing a subset of objects
I'm trying to train a neural net to choose a subset from some list of objects.
The input is a list of objects $(a,b,c,d,e,f)$ and for each list of objects the label is a list composed of 0/1 - 1 for every object that is in the subset, for example…

Gilad Deutsch
- 629
- 5
- 12
5
votes
6 answers
What jobs cannot be automatized by AI in the future?
AI is progressing drastically, and imagine they tell you you're fired because a robot will take your place. What are some jobs that can never be automated?

bmwalide
- 399
- 2
- 6
5
votes
2 answers
Is it possible to guide a reinforcement learning algorithm?
I have just started to study reinforcement learning and, as far as I understand, existing algorithms search for the optimal solution/policy, but do not allow the possibility for the programmer to suggest a way to find the solution (to guide their…

Cristian M
- 249
- 2
- 6
5
votes
4 answers
How does size of the dataset depend on VC dimension of the hypothesis class?
This might be a little broad question, but I have been watching Caltech youtube videos on Machine Learning, and in this video prof. is trying to explain how we should interpret the VC dimension in terms of what it means in layman terms, and why do…

Stefan Radonjic
- 187
- 5
5
votes
1 answer
What is the calcium equivalent role in neural networks
I understand that neural networks model biological neurons. Each node in the network represents a neuron cell and the connections between nodes represent the connections between cells. As in nature, a neuron fires an electrical signal to connected…

k rey
- 163
- 4
5
votes
1 answer
What are some resources for coding some artificial intelligence techniques in the context of games?
I know the most basic rudimentary theory on AI, and I want to delve into actual practical coding with AI and machine learning. I already know a decent bit of coding in C++ and I'm learning Python syntax now.
I think I want to start implementing…

Tarun
- 53
- 4