For questions related to artificial intelligence research papers. So, you should use this tag if you want someone to clarify something in a research paper.
Questions tagged [papers]
327 questions
27
votes
2 answers
What is sample efficiency, and how can importance sampling be used to achieve it?
For instance, the title of this paper reads: "Sample Efficient Actor-Critic with Experience Replay".
What is sample efficiency, and how can importance sampling be used to achieve it?

Gokul NC
- 423
- 1
- 4
- 7
24
votes
3 answers
Why do most deep learning papers not include an implementation?
I'm a novice researcher, and as I started to read papers in the area of deep learning I noticed that the implementation is normally not added and is needed to be searched elsewhere, and my question is how come that's the case? The paper's authors…

Gilad Deutsch
- 629
- 5
- 12
20
votes
5 answers
Why does Batch Normalization work?
Adding BatchNorm layers improves training time and makes the whole deep model more stable. That's an experimental fact that is widely used in machine learning practice.
My question is - why does it work?
The original (2015) paper motivated the…

Kostya
- 2,416
- 7
- 23
18
votes
4 answers
Where can I find the original paper that introduced RNNs?
I was able to find the original paper on LSTM, but I was not able to find the paper that introduced "vanilla" RNNs. Where can I find it?

Ahsan Tarique
- 281
- 1
- 2
- 5
17
votes
1 answer
What is the intuition behind the dot product attention?
I am watching the video Attention Is All You Need by Yannic Kilcher.
My question is: what is the intuition behind the dot product attention?
$$A(q,K, V) = \sum_i\frac{e^{q.k_i}}{\sum_j e^{q.k_j}} v_i$$
becomes:
$$A(Q,K, V) = \text{softmax}(QK^T)V$$

DRV
- 1,573
- 2
- 11
- 18
14
votes
4 answers
Can some one help me understand this paragraph from Nvidia's progressive GAN paper?
In the paper Progressive growing of gans for improved quality, stability, and variation (ICLR, 2018) by Nvidia researchers, the authors write
Furthermore, we observe that mode collapses traditionally
plaguing GANs tend to happen very quickly, over…

Inkplay_
- 411
- 4
- 8
13
votes
1 answer
How would DeepMind's new differentiable neural computer scale?
DeepMind just published a paper about a differentiable neural computer, which basically combines a neural network with a memory.
The idea is to teach the neural network to create and recall useful explicit memories for a certain task. This…

BlindKungFuMaster
- 4,185
- 11
- 23
9
votes
1 answer
How does weight normalization work?
I was reading the paper Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks about improving the learning of an ANN using weight normalization.
They consider standard artificial neural networks where the…

Mike AI
- 145
- 2
- 8
9
votes
2 answers
What is different in each head of a multi-head attention mechanism?
I have a difficult time understanding the "multi-head" notion in the original transformer paper. What makes the learning in each head unique? Why doesn't the neural network learn the same set of parameters for each attention head? Is it because we…

mhsnk
- 113
- 1
- 4
8
votes
2 answers
Where to publish a first article in Deep Reinforcement Learning?
What would be examples of journals that are good for a first publication in the field of Deep Reinforcement Learning?
I am in the process of writing about the research results of DQN-related algorithms.
I have 3 requirements - it should be indexed…

Evalds Urtans
- 377
- 3
- 9
8
votes
1 answer
What is the difference between logic-based and rule-based AI?
I always thought rule-based was synonymous with logic-based AI. Logic has axioms and rules of inference, whereas rule-based AI has a knowledge base (essentially, axioms) and if-then rules to create new knowledge (essentially inference rules).
But in…

samlaf
- 211
- 2
- 6
8
votes
1 answer
What are some resources on computational learning theory?
Pretty soon I will be finishing up Understanding Machine Learning: From Theory to Algorithms by Shai Ben-David and Shai Shalev-Shwartz. I absolutely love the subject and want to learn more, the only issue is I'm having trouble finding a book that…

PMaynard
- 238
- 1
- 5
7
votes
2 answers
How can a neural network distinguish a rotated 6 and 9 digits?
Rotated MNIST is a popular dataset for benchmarking models equivariant to rotations on $\mathbb{R}^2$, described by $SO(2)$ group or its discrete subgroups like $\mathbb{Z}^{n}$:
Group equivariant convolutional networks
Harmonic networks
It…

spiridon_the_sun_rotator
- 2,454
- 8
- 16
7
votes
2 answers
Why are reinforcement learning methods sample inefficient?
Reinforcement learning methods are considered to be extremely sample inefficient.
For example, in a recent DeepMind paper by Hessel et al., they showed that in order to reach human-level performance on an Atari game running at 60 frames per second…

rrz0
- 263
- 2
- 7
7
votes
1 answer
How does the network know which objects to track in the paper "Label-Free Supervision of Neural Networks with Physics and Domain Knowledge"?
I was reading the paper Label-Free Supervision of Neural Networks with Physics and Domain Knowledge, published at AAAI 2017, which won the best paper award.
I understand the math and it makes sense. Consider the first application shown in the paper…

sanjeev mk
- 171
- 2