Highest Voted Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

What is the loss for policy gradients with continuous actions?

I know with policy gradients used in an environment with a discrete action space are updated with $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t} $$ where $v_t$ could be many things that represent how…

neural-networks reinforcement-learning policy-gradients deterministic-policy

asked Sep 30 '20 at 22:12

S2673

560
4
16

5

votes

3 answers

Why neural networks tend to be trained to recognize multiple things instead of just one?

I was watching this series: https://www.youtube.com/watch?v=aircAruvnKk The series demonstrates neural networks by building a simple number recognizing network. It got me thinking: Why neural networks try to recognize multiple labels instead of just…

neural-networks

asked Sep 30 '20 at 14:09

Ville

151
2

5

votes

3 answers

How do weak learners become strong in boosting?

Boosting refers to a family of algorithms which converts weak learners to strong learners. How does it happen?

machine-learning gradient-boosting boosting

asked Sep 26 '20 at 18:58

Legend

103
3

5

votes

1 answer

Using ConceptNet5 to find similar systems to solve specific problems?

I installed a locally running instance of the ConceptNet5 knowledgebase in an elasticsearch server. I used this data to implement the so-called "Analogietechnik" (a creativity technique to solve a problem from the perspective of another system) as…

algorithm-request

asked Nov 22 '16 at 21:58

hardking

59
2

5

votes

1 answer

Are there any microchips specifically designed to run ANNs?

I'm interested in hardware implementation of ANNs (artificial neural networks). Are there any popular existing technology implementations in form of microchips which are purpose designed to run artificial neural networks? For example, a chip which…

image-recognition hardware

asked Aug 03 '16 at 16:22

kenorb

10,423
3
43
91

5

votes

4 answers

How can an artificial general intelligence determine which information is true?

After the explosion of fake news during the US election, and following the question about whether AIs can educate themselves via the internet, it is clear to me that any newly-launched AI will have a serious problem knowing what to believe (that is,…

philosophy agi

asked Nov 19 '16 at 06:02

Jnani Jenny Hale

521
2
10

5

votes

2 answers

What is the weight matrix in self-attention?

I've been looking into self-attention lately, and in the articles that I've been seeing, they all talk about "weights" in attention. My understanding is that the weights in self-attention are not the same as the weights in a neural network. From…

neural-networks transformer attention

asked Aug 29 '20 at 16:36

Mark

233
1
6

5

votes

1 answer

What's the optimal policy in the rock-paper-scissors game?

A deterministic policy in the rock-paper-scissors game can be easily exploited by the opponent - by doing just the right sequence of moves to defeat the agent. More often than not, I've heard that a random policy is the optimal policy in this case -…

reinforcement-learning game-theory optimal-policy

asked Aug 27 '20 at 10:22

stoic-santiago

1,121
5
18

5

votes

1 answer

What does the notation $\mathcal{N}(z; \mu, \sigma)$ stand for in statistics?

I know that the notation $\mathcal{N}(\mu, \sigma)$ stands for a normal distribution. But I'm reading the book "An Introduction to Variational Autoencoders" and in it, there is this notation: $$\mathcal{N}(z; 0, I)$$ What does it mean? picture of…

terminology variational-autoencoder notation random-variable bayesian-statistics

asked Aug 23 '20 at 17:49

Peyman

534
3
10

5

votes

1 answer

How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG?

In section 3 of the paper Continuous control with deep reinforcement learning, the authors write As detailed in the supplementary materials we used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) to generate temporally correlated…

reinforcement-learning deep-rl policy-gradients papers ddpg

asked Aug 21 '20 at 20:00

dani

51
3

5

votes

1 answer

Why is the mean used to compute the expectation in the GAN loss?

From Goodfellow et al. (2014), we have the adversarial loss: $$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$ In practice, the expectation is…

deep-learning objective-functions generative-adversarial-networks expectation

asked Aug 21 '20 at 05:01

A is for Ambition

153
4

5

votes

1 answer

Can you convert a MDP problem to a Contextual Multi-Arm Bandits problem?

I'm trying to get a better understanding of Multi-Arm Bandits, Contextual Multi-Arm Bandits and Markov Decision Process. Basically, Multi-Arm Bandits is a special case of Contextual Multi-Arm Bandits where there is no state(features/context). And…

reinforcement-learning comparison markov-decision-process multi-armed-bandits contextual-bandits

asked Aug 17 '20 at 03:17

peidaqi

151
1

5

votes

2 answers

Why are policy iteration and value iteration studied as separate algorithms?

In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms. This is very confusing because policy iteration includes an update/change of value and value iteration…

reinforcement-learning comparison value-iteration policy-iteration

asked Aug 13 '20 at 13:31

User007

51
3

5

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

reinforcement-learning monte-carlo-methods temporal-difference-methods markov-property dynamic-programming

asked Aug 07 '20 at 05:19

stoic-santiago

1,121
5
18

5

votes

1 answer

How can I find a specific word in an audio file?

I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…

neural-networks machine-learning deep-learning python audio-processing

asked Aug 03 '20 at 09:28

Ali.kavari76

111
6

Most Popular