Most Popular
1500 questions
5
votes
2 answers
Why am I getting the incorrect value of lambda?
I am trying to solve for $\lambda$ using temporal-difference learning. More specifically, I am trying to figure out what $\lambda$ I need, such that $\text{TD}(\lambda)=\text{TD}(1)$, after one iteration. But I get the incorrect value of…

Amanda
- 205
- 1
- 5
5
votes
1 answer
How define a reward function for a humanoid agent whose goal is to stand up from the ground?
I'm trying to teach a humanoid agent how to stand up after falling. The episode starts with the agent lying on the floor with its back touching the ground, and its goal is to stand up in the shortest amount of time.
But I'm having trouble in regards…

Tirafesi
- 151
- 1
5
votes
1 answer
A3C fails to solve MountainCar-v0 enviroment (implementation by OpenAi gym)
While I've been able to solve MountainCar-v0 using Deep Q learning, no matter what I try I can't solve this enviroment using policy-gradient approaches. As far as I learnt searching the web, this is a really hard enviroment to solve, mainly because…

Scorpio76
- 61
- 2
5
votes
1 answer
Is there any paper, article or book that analyzes the feasibility of acheiving AGI through brain-simulation?
In my understanding, the mind arises from a physical system, the brain. I see that there is a big research under the topic of simulating physical systems efficiently (especially in quantum computing). Hence, in theory, we could achieve AGI by…

olinarr
- 745
- 6
- 20
5
votes
2 answers
Reinforcement learning with uniformly random dynamics
Suppose I have an MDP $(S, A, p, R)$ where the $p(s_j|s_i,a_i)$ is uniform, i.e given an state $s_i$ and an action $a_i$ all states $s_j$ are equally probable.
Now I want to find an optimal policy for this MDP. Can I just apply the usual methods…

grok
- 151
- 3
5
votes
2 answers
Are self-driving cars using single frame or multiple frame to make decision?
This might be a trivial question but I couldn't find any reliable answers on the internet.
Almost all the neural network architectures for self-driving cars that I have seen on the internet have a feedforward network, previous frames will not help…
5
votes
1 answer
Name of paper for encoding/representing XY coordinates in deep learning
It this podcast between Oriol Vinyals and Lex Friedman: https://youtu.be/Kedt2or9xlo?t=1769, at 29:29, Oriol Vinyals refers to a paper:
If you look at research in computer vision where it makes a lot of sense to treat images as two dimensional…

Benjamin Crouzier
- 311
- 2
- 6
5
votes
2 answers
Neural Network with varying inputs (for a game ai)
I want to create a simple game which basically consists of 2d circles shooting smaller circles at each other (to make hitbox detection easier for the start). My goal is to create an ai which adapts its own behaviour to the player‘s. For that, i want…

Cr3ative
- 53
- 4
5
votes
2 answers
How does Lucas's argument work?
In Minds, Machines and Gödel (1959), J. R. Lucas shows that any human mathematician can not be represented by an algorithmic automaton (a Turing Machine, but any computer is equivalent to it by the Church-Turing thesis), using Gödel's incompleteness…

wythagoras
- 1,511
- 12
- 27
5
votes
4 answers
How to stop DQN Q function from increasing during learning?
Following the DQN algorithm with experience replay:
Store transition $\left(\phi_{t}, a_{t}, r_{t}, \phi_{t+1}\right)$ in $D$ Sample random minibatch of transitions $\left(\phi_{j}, a_{j}, r_{j}, \phi_{j+1}\right)$ from $D$…

BestR
- 183
- 1
- 7
5
votes
1 answer
Is it possible to make a 'forked path' neural network?
I want to make a network, specifically a CNN for image recognition, that takes an input, processes it the same way for several layers, and then at some point splits before coming to two different outputs. Is it possible to create a network such as…

Fred E
- 155
- 2
5
votes
1 answer
Understanding the n-step off-policy SARSA update
In Sutton & Barto's book (2nd ed) page 149, there is the equation 7.11
I am having a hard time understanding this equation.
I would have thought that we should be moving $Q$ towards $G$, where $G$ would be corrected by importance sampling, but only…

Antoine Savine
- 153
- 4
5
votes
1 answer
Does backpropagation update weights one layer at a time?
I am new to Deep Learning.
Suppose that we have a neural network with one input layer, one output layer, and one hidden layer. Let's refer to the weights from input to hidden as $W$ and the weights from hidden to output as $V$. Suppose that we have…

Joshua Jones
- 53
- 3
5
votes
1 answer
Cold start collaborative filtering with NLP
I’m looking to match two pieces of text - e.g. IMDb movie descriptions and each person’s description of the type of movies they like. I have an existing set of ~5000 matches between the two. I particularly want to overcome the cold-start problem:…

Derek Hans
- 71
- 2
5
votes
2 answers
How to detect frauds in advertising business using machine learning?
I am very beginner to this world. I still learning the basics of Machine learning and AI but i have a problem at hand and i am not sure which technique or Algorithm can be applied on it.
I am working on Click-Fraud detection in advertising. I need…

Mirza
- 61
- 4