Most Popular

1500 questions
5
votes
1 answer

What is the loss for policy gradients with continuous actions?

I know with policy gradients used in an environment with a discrete action space are updated with $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t} $$ where $v_t$ could be many things that represent how…
5
votes
3 answers

Why neural networks tend to be trained to recognize multiple things instead of just one?

I was watching this series: https://www.youtube.com/watch?v=aircAruvnKk The series demonstrates neural networks by building a simple number recognizing network. It got me thinking: Why neural networks try to recognize multiple labels instead of just…
Ville
  • 151
  • 2
5
votes
3 answers

How do weak learners become strong in boosting?

Boosting refers to a family of algorithms which converts weak learners to strong learners. How does it happen?
Legend
  • 103
  • 3
5
votes
1 answer

Using ConceptNet5 to find similar systems to solve specific problems?

I installed a locally running instance of the ConceptNet5 knowledgebase in an elasticsearch server. I used this data to implement the so-called "Analogietechnik" (a creativity technique to solve a problem from the perspective of another system) as…
hardking
  • 59
  • 2
5
votes
1 answer

Are there any microchips specifically designed to run ANNs?

I'm interested in hardware implementation of ANNs (artificial neural networks). Are there any popular existing technology implementations in form of microchips which are purpose designed to run artificial neural networks? For example, a chip which…
kenorb
  • 10,423
  • 3
  • 43
  • 91
5
votes
4 answers

How can an artificial general intelligence determine which information is true?

After the explosion of fake news during the US election, and following the question about whether AIs can educate themselves via the internet, it is clear to me that any newly-launched AI will have a serious problem knowing what to believe (that is,…
Jnani Jenny Hale
  • 521
  • 2
  • 10
5
votes
2 answers

What is the weight matrix in self-attention?

I've been looking into self-attention lately, and in the articles that I've been seeing, they all talk about "weights" in attention. My understanding is that the weights in self-attention are not the same as the weights in a neural network. From…
Mark
  • 233
  • 1
  • 6
5
votes
1 answer

What's the optimal policy in the rock-paper-scissors game?

A deterministic policy in the rock-paper-scissors game can be easily exploited by the opponent - by doing just the right sequence of moves to defeat the agent. More often than not, I've heard that a random policy is the optimal policy in this case -…
5
votes
1 answer

What does the notation $\mathcal{N}(z; \mu, \sigma)$ stand for in statistics?

I know that the notation $\mathcal{N}(\mu, \sigma)$ stands for a normal distribution. But I'm reading the book "An Introduction to Variational Autoencoders" and in it, there is this notation: $$\mathcal{N}(z; 0, I)$$ What does it mean? picture of…
5
votes
1 answer

How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG?

In section 3 of the paper Continuous control with deep reinforcement learning, the authors write As detailed in the supplementary materials we used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) to generate temporally correlated…
5
votes
1 answer

Why is the mean used to compute the expectation in the GAN loss?

From Goodfellow et al. (2014), we have the adversarial loss: $$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$ In practice, the expectation is…
5
votes
1 answer

Can you convert a MDP problem to a Contextual Multi-Arm Bandits problem?

I'm trying to get a better understanding of Multi-Arm Bandits, Contextual Multi-Arm Bandits and Markov Decision Process. Basically, Multi-Arm Bandits is a special case of Contextual Multi-Arm Bandits where there is no state(features/context). And…
5
votes
2 answers

Why are policy iteration and value iteration studied as separate algorithms?

In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms. This is very confusing because policy iteration includes an update/change of value and value iteration…
5
votes
1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…
5
votes
1 answer

How can I find a specific word in an audio file?

I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…