Most Popular
1500 questions
5
votes
0 answers
Wasserstein GAN: Implemention of Critic Loss Correct?
The WGAN paper concretely proposes Algorithm 1 (cf. page 8). Now, they also state what their loss for the critic and the generator is.
When implementing the critic loss (so lines 5 and 6 of Algorithm 1), they maximize the parameters $w$ (instead of…

Anonymous5638
- 171
- 5
5
votes
2 answers
What's the difference between architectures and backbones?
In the paper "ForestNet: Classifying Drivers of Deforestation in Indonesia using Deep Learning on Satellite Imagery", the authors talk about using:
Feature Pyramid Networks (as the architecture)
EfficientNet-B2 (as the backbone)
Performance…

codinggirl123
- 51
- 1
5
votes
1 answer
Multi Armed Bandits with large number of arms
I'm dealing with a (stochastic) Multi Armed Bandit (MAB) with a large number of arms.
Consider a pizza machine that produces a pizza depending on an input $i$ (equivalent to an arm). The (finite) set of arms $K$ is given by $K=X_1\times X_2 \times…

D. B.
- 101
- 6
5
votes
2 answers
Transformers: how does the decoder final layer output the desired token?
In the paper Attention Is All You Need, this section confuses me:
In our model, we share the same weight matrix between the two embedding layers [in the encoding section] and the pre-softmax linear transformation [output of the decoding…

user3667125
- 1,500
- 5
- 13
5
votes
1 answer
Can AlphaFold predict proteins with metals well?
There are certain proteins that contain metal components, known as metalloproteins. Commonly, the metal is at the active site which needs the most prediction precision. Typically, there is only one (or a few) metals in a protein, which contains far…

jw_
- 199
- 1
- 5
5
votes
1 answer
What difference does it make whether Actor and Critic share the same network or not?
I'm learning about Actor-Critic reinforcement learning algorithms. One source I encountered mentioned that Actor and Critic can either share one network (but use different output layers) or they can use two completely separate networks. In this…

mark mark
- 753
- 4
- 23
5
votes
2 answers
How to detect a full-fledged self-aware AI?
The premise: A full-fledged self-aware artificial intelligence may have come to exist in a distributed environment like the internet. The possible A.I. in question may be quite unwilling to reveal itself.
The question: Given a first initial…

user4327
- 61
- 5
5
votes
1 answer
Why does off-policy learning outperform on-policy learning?
I am self-studying about Reinforcement Learning using different online resources. I now have a basic understanding of how RL works.
I saw this in a book:
Q-learning is an off-policy learner. An off-policy learner learns the value of an optimal…

Exploring
- 223
- 6
- 16
5
votes
2 answers
Given two optimal policies, is an affine combination of them also optimal?
If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy?
Here I…

yang liu
- 53
- 3
5
votes
1 answer
Why are "Transformers" called this way?
What is the reason behind the name "Transformers", for Multi Head Self-Attention-based neural networks from Attention is All You Need?
I have been googling this question for a long time, and nowhere I can find any explanation.

Leevo
- 285
- 1
- 9
5
votes
0 answers
What is the justification for Kaiming He initialization?
I've been trying to understand where the formulas for Xavier and Kaiming He initialization come from. My understanding is that these initialization schemes come from a desire to keep the gradients stable during back-propagation (avoiding…

Jack M
- 242
- 1
- 8
5
votes
1 answer
Is reinforcement learning only about determining the value function?
I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function).
Are there any…

Felix P.
- 287
- 1
- 6
5
votes
1 answer
Is there an efficient way to implement a random crossover of individuals stored in a matrix?
I am using a GA to optimise an ANN in Matlab. This ANN is pretty basic (input, hidden, output) but the input size is quite large (10,000) and the output size is 2 since I have to classes of images to be classified.
The weights are in the form of 2…

user3952
- 51
- 2
5
votes
3 answers
Has an AI ever solved a detective mystery?
In detective novels, the point is often that the reader gets enough information to solve the crime themselves. This "puzzle" aspect of detective novels is part of the attraction.
Often the difficulty for humans is to keep track of all the variables…

S.L. Barth is on codidact.com
- 1,204
- 1
- 10
- 21
5
votes
2 answers
What is the advantage of using cross entropy loss & softmax?
I am trying to do the standard MNIST dataset image recognition test with a standard feed forward NN, but my network failed pretty badly. Now I have debugged it quite a lot and found & fixed some errors, but I had a few more ideas. For one, I am…

Ben
- 425
- 3
- 10