Questions tagged [papers]

For questions related to artificial intelligence research papers. So, you should use this tag if you want someone to clarify something in a research paper.

327 questions
27
votes
2 answers

What is sample efficiency, and how can importance sampling be used to achieve it?

For instance, the title of this paper reads: "Sample Efficient Actor-Critic with Experience Replay". What is sample efficiency, and how can importance sampling be used to achieve it?
24
votes
3 answers

Why do most deep learning papers not include an implementation?

I'm a novice researcher, and as I started to read papers in the area of deep learning I noticed that the implementation is normally not added and is needed to be searched elsewhere, and my question is how come that's the case? The paper's authors…
Gilad Deutsch
  • 629
  • 5
  • 12
20
votes
5 answers

Why does Batch Normalization work?

Adding BatchNorm layers improves training time and makes the whole deep model more stable. That's an experimental fact that is widely used in machine learning practice. My question is - why does it work? The original (2015) paper motivated the…
18
votes
4 answers

Where can I find the original paper that introduced RNNs?

I was able to find the original paper on LSTM, but I was not able to find the paper that introduced "vanilla" RNNs. Where can I find it?
17
votes
1 answer

What is the intuition behind the dot product attention?

I am watching the video Attention Is All You Need by Yannic Kilcher. My question is: what is the intuition behind the dot product attention? $$A(q,K, V) = \sum_i\frac{e^{q.k_i}}{\sum_j e^{q.k_j}} v_i$$ becomes: $$A(Q,K, V) = \text{softmax}(QK^T)V$$
DRV
  • 1,573
  • 2
  • 11
  • 18
14
votes
4 answers

Can some one help me understand this paragraph from Nvidia's progressive GAN paper?

In the paper Progressive growing of gans for improved quality, stability, and variation (ICLR, 2018) by Nvidia researchers, the authors write Furthermore, we observe that mode collapses traditionally plaguing GANs tend to happen very quickly, over…
13
votes
1 answer

How would DeepMind's new differentiable neural computer scale?

DeepMind just published a paper about a differentiable neural computer, which basically combines a neural network with a memory. The idea is to teach the neural network to create and recall useful explicit memories for a certain task. This…
9
votes
1 answer

How does weight normalization work?

I was reading the paper Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks about improving the learning of an ANN using weight normalization. They consider standard artificial neural networks where the…
9
votes
2 answers

What is different in each head of a multi-head attention mechanism?

I have a difficult time understanding the "multi-head" notion in the original transformer paper. What makes the learning in each head unique? Why doesn't the neural network learn the same set of parameters for each attention head? Is it because we…
8
votes
2 answers

Where to publish a first article in Deep Reinforcement Learning?

What would be examples of journals that are good for a first publication in the field of Deep Reinforcement Learning? I am in the process of writing about the research results of DQN-related algorithms. I have 3 requirements - it should be indexed…
Evalds Urtans
  • 377
  • 3
  • 9
8
votes
1 answer

What is the difference between logic-based and rule-based AI?

I always thought rule-based was synonymous with logic-based AI. Logic has axioms and rules of inference, whereas rule-based AI has a knowledge base (essentially, axioms) and if-then rules to create new knowledge (essentially inference rules). But in…
8
votes
1 answer

What are some resources on computational learning theory?

Pretty soon I will be finishing up Understanding Machine Learning: From Theory to Algorithms by Shai Ben-David and Shai Shalev-Shwartz. I absolutely love the subject and want to learn more, the only issue is I'm having trouble finding a book that…
7
votes
2 answers

How can a neural network distinguish a rotated 6 and 9 digits?

Rotated MNIST is a popular dataset for benchmarking models equivariant to rotations on $\mathbb{R}^2$, described by $SO(2)$ group or its discrete subgroups like $\mathbb{Z}^{n}$: Group equivariant convolutional networks Harmonic networks It…
7
votes
2 answers

Why are reinforcement learning methods sample inefficient?

Reinforcement learning methods are considered to be extremely sample inefficient. For example, in a recent DeepMind paper by Hessel et al., they showed that in order to reach human-level performance on an Atari game running at 60 frames per second…
rrz0
  • 263
  • 2
  • 7
7
votes
1 answer

How does the network know which objects to track in the paper "Label-Free Supervision of Neural Networks with Physics and Domain Knowledge"?

I was reading the paper Label-Free Supervision of Neural Networks with Physics and Domain Knowledge, published at AAAI 2017, which won the best paper award. I understand the math and it makes sense. Consider the first application shown in the paper…
1
2 3
21 22