For questions related to the convergence of AI algorithms.
Questions tagged [convergence]
91 questions
8
votes
2 answers
What is convergence in machine learning?
I came across this answer on Quora, but it was pretty sparse. I'm looking for specific meanings in the context of machine learning, but also mathematical and economic notions of the term in general.

DukeZhou
- 6,237
- 5
- 25
- 53
7
votes
0 answers
Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?
It is proved that the Bellman update is a contraction (1).
Here is the Bellman update that is used for Q-Learning:
$$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s',
a^*)) - Q_t(s,a)) \tag{1} \label{1}$$
The proof…

sirfroggy
- 71
- 3
7
votes
1 answer
Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input?
While reading the DQN paper, I found that randomly selecting and learning samples reduced divergence in RL using a non-linear function approximator (e.g a neural network).
So, why does Reinforcement Learning using a non-linear function approximator…

강문주
- 71
- 2
6
votes
1 answer
Deep Q-Learning poor convergence on Stochastic Environment
I'm trying to implement a Deep Q-network in Keras/TF that learns to play Minesweeper (our stochastic environment). I have noticed that the agent learns to play the game pretty well with both small and large board sizes. However, it only…

Sanavesa
- 153
- 1
- 6
6
votes
1 answer
How to create and train (with mutation and selection) a neural network to predict the next state of a board?
I'm aiming to create a neural network that can learn to predict the next state of a board using the rules of Conway's Game of Life.
Technically, I have three questions, but I felt that they needed to be together to get the full picture.
My network…

Aric
- 275
- 1
- 6
6
votes
1 answer
What are the conditions of convergence of temporal-difference learning?
In reinforcement learning, temporal difference seem to update the value function in each new iteration of experience absorbed from the environment.
What would be the conditions for temporal-difference learning to converge in the end? How is it…

MJeremy
- 163
- 3
6
votes
1 answer
Convergence of semi-gradient TD(0) with non-linear function approximation
I am looking for a result that shows the convergence of semi-gradient TD(0) algorithm with non-linear function approximation for on-policy prediction. Specifically, the update equation is given by (borrowing notation from Sutton and Barto…

srinivas tunuguntla
- 61
- 2
6
votes
1 answer
How to show temporal difference methods converge to MLE?
In chapter 6 of Sutton and Barto (p. 128), they claim temporal difference converges to the maximum likelihood estimate (MLE). How can this be shown formally?

user
- 203
- 1
- 7
5
votes
2 answers
What is curriculum learning in reinforcement learning?
I recently came across the term "curriculum learning" in the context of DRL and was intrigued by its potential to improve the learning process. As such, what is curriculum learning? And how can it be helpful for the convergence of RL algorithms?

Robin van Hoorn
- 1,810
- 7
- 32
5
votes
2 answers
How to check whether my loss function is convex or not?
Loss functions are useful in calculating loss and then we can update the weights of a neural network. The loss function is thus useful in training neural networks.
Consider the following excerpt from this answer
In principle, differentiability is…

hanugm
- 3,571
- 3
- 18
- 50
5
votes
1 answer
Does the policy iteration convergence hold for finite-horizon MDP?
Most RL books (Sutton & Barto, Bertsekas, etc.) talk about policy iteration for infinite-horizon MDPs. Does the policy iteration convergence hold for finite-horizon MDP? If yes, how can we derive the algorithm?

user529295
- 359
- 1
- 10
5
votes
1 answer
Why does Q-learning converge under 100% exploration rate?
I am working on this assignment where I made the agent learn state-action values (Q-values) with Q-learning and 100% exploration rate. The environment is the classic gridworld as shown in the following picture.
Here are the values of my…

Rim Sleimi
- 215
- 1
- 6
5
votes
1 answer
When exactly is a model considered over-parameterized?
When exactly is a model considered over-parameterized?
There are some recent researches in Deep Learning about the role of over-parameterization toward generalization, so it would be nice if I can know what exactly can be considered as such.
A…

Phúc Lê
- 161
- 5
5
votes
1 answer
How can we conclude that an optimization algorithm is better than another one
When we test a new optimization algorithm, what the process that we need to do?For example, do we need to run the algorithm several times, and pick a best performance,i.e., in terms of accuracy, f1 score .etc, and do the same for an old optimization…

user29902
- 51
- 1
4
votes
1 answer
How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?
I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…

apitsch
- 93
- 9