Most Popular
1500 questions
22
votes
3 answers
Why doesn't Q-learning converge when using function approximation?
The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied
$\sum_{t} \alpha_t(s, a) = \infty$
$\sum_{t}…

nbro
- 39,006
- 12
- 98
- 176
22
votes
2 answers
What is the difference between reinforcement learning and optimal control?
Coming from a process (optimal) control background, I have begun studying the field of deep reinforcement learning.
Sutton & Barto (2015) state that
particularly important (to the writing of the text) have been the contributions establishing and…

Bionic Buffulo
- 353
- 1
- 2
- 8
21
votes
2 answers
How to define states in reinforcement learning?
I am studying reinforcement learning and the variants of it. I am starting to get an understanding of how the algorithms work and how they apply to an MDP.
What I don't understand is the process of defining the states of the MDP. In most examples…

Andy
- 313
- 1
- 2
- 6
21
votes
3 answers
Is a dystopian surveillance state computationally possible?
This isn't really a conspiracy theory question. More of an inquire on the global computational power and data storage logistics question.
Most recording instruments such as cameras and microphones are typically voluntary opt in devices, in that,…

Harrison Tran
- 319
- 2
- 6
20
votes
3 answers
How are Artificial Neural Networks and the Biological Neural Networks similar and different?
I've heard multiple times that "Neural Networks are the best approximation we have to model the human brain", and I think it is commonly known that Neural Networks are modelled after our brain.
I strongly suspect that this model has been simplified,…

Andreas Storvik Strauman
- 491
- 3
- 15
20
votes
3 answers
How can we process the data from both the true distribution and the generator?
I'm struggling to understand the GAN loss function as provided in Understanding Generative Adversarial Networks (a blog post written by Daniel Seita).
In the standard cross-entropy loss, we have an output that has been run through a sigmoid function…

tryingtolearn
- 385
- 1
- 2
- 10
20
votes
2 answers
How do neural networks play chess?
I have been spending a few days trying to wrap my head around how and why neural networks are used to play chess.
Although I know very little about how the game of chess works, I can understand the following idea. Theoretically, we could make a…

stats_noob
- 269
- 2
- 11
20
votes
5 answers
Why does Batch Normalization work?
Adding BatchNorm layers improves training time and makes the whole deep model more stable. That's an experimental fact that is widely used in machine learning practice.
My question is - why does it work?
The original (2015) paper motivated the…

Kostya
- 2,416
- 7
- 23
20
votes
2 answers
What is the "Hello World" problem of Reinforcement Learning?
As we all know, "Hello World" is usually the first program that any programmer learns/implements in any language/framework.
As Aurélien Géron mentioned in his book that MNIST is often called the Hello World of Machine Learning, is there any "Hello…

Arpit-Gole
- 394
- 2
- 9
20
votes
1 answer
Would Google's self-driving-car stop when it sees somebody with a T-shirt with a stop sign printed on it?
In Hidden Obstacles for Google’s Self-Driving Cars article we can read that:
Google’s cars can detect and respond to stop signs that aren’t on its map, a feature that was introduced to deal with temporary signs used at construction sites.
Google…

kenorb
- 10,423
- 3
- 43
- 91
20
votes
3 answers
Are Asimov's Laws flawed by design, or are they feasible in practice?
Isaac Asimov's famous Three Laws of Robotics originated in the context of Asimov's science fiction stories. In those stories, the three laws serve as a safety measure, in order to avoid untimely or manipulated situations from exploding in…

3442
- 768
- 4
- 11
20
votes
2 answers
Problems that only humans will ever be able to solve
With the increasing complexity of reCAPTCHA, I wondered about the existence of some problem, that only a human will ever be able to solve (or that AI won't be able to solve as long as it doesn't reproduce exactly the human brain).
For instance, the…

Marc Perlade
- 303
- 1
- 6
20
votes
2 answers
What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?
I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally.
So, what difference does the second point in both the slides refer to?
user9947
19
votes
2 answers
Are Modular Neural Networks more effective than large, monolithic networks at any tasks?
Modular/Multiple Neural networks (MNNs) revolve around training smaller, independent networks that can feed into each other or another higher network.
In principle, the hierarchical organization could allow us to make sense of more complex problem…

Harsh Sikka
- 191
- 2
19
votes
3 answers
What are the mathematical prerequisites for an AI researcher?
What are the mathematical prerequisites for understanding the core part of various algorithms involved in artificial intelligence and developing one's own algorithms?
Please, refer to some specific books.

Surya Bhusal
- 371
- 2
- 12