Questions tagged [convergence]

For questions related to the convergence of AI algorithms.

91 questions
8
votes
2 answers

What is convergence in machine learning?

I came across this answer on Quora, but it was pretty sparse. I'm looking for specific meanings in the context of machine learning, but also mathematical and economic notions of the term in general.
DukeZhou
  • 6,237
  • 5
  • 25
  • 53
7
votes
0 answers

Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

It is proved that the Bellman update is a contraction (1). Here is the Bellman update that is used for Q-Learning: $$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s', a^*)) - Q_t(s,a)) \tag{1} \label{1}$$ The proof…
7
votes
1 answer

Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input?

While reading the DQN paper, I found that randomly selecting and learning samples reduced divergence in RL using a non-linear function approximator (e.g a neural network). So, why does Reinforcement Learning using a non-linear function approximator…
6
votes
1 answer

Deep Q-Learning poor convergence on Stochastic Environment

I'm trying to implement a Deep Q-network in Keras/TF that learns to play Minesweeper (our stochastic environment). I have noticed that the agent learns to play the game pretty well with both small and large board sizes. However, it only…
6
votes
1 answer

How to create and train (with mutation and selection) a neural network to predict the next state of a board?

I'm aiming to create a neural network that can learn to predict the next state of a board using the rules of Conway's Game of Life. Technically, I have three questions, but I felt that they needed to be together to get the full picture. My network…
Aric
  • 275
  • 1
  • 6
6
votes
1 answer

What are the conditions of convergence of temporal-difference learning?

In reinforcement learning, temporal difference seem to update the value function in each new iteration of experience absorbed from the environment. What would be the conditions for temporal-difference learning to converge in the end? How is it…
6
votes
1 answer

Convergence of semi-gradient TD(0) with non-linear function approximation

I am looking for a result that shows the convergence of semi-gradient TD(0) algorithm with non-linear function approximation for on-policy prediction. Specifically, the update equation is given by (borrowing notation from Sutton and Barto…
6
votes
1 answer

How to show temporal difference methods converge to MLE?

In chapter 6 of Sutton and Barto (p. 128), they claim temporal difference converges to the maximum likelihood estimate (MLE). How can this be shown formally?
5
votes
2 answers

What is curriculum learning in reinforcement learning?

I recently came across the term "curriculum learning" in the context of DRL and was intrigued by its potential to improve the learning process. As such, what is curriculum learning? And how can it be helpful for the convergence of RL algorithms?
5
votes
2 answers

How to check whether my loss function is convex or not?

Loss functions are useful in calculating loss and then we can update the weights of a neural network. The loss function is thus useful in training neural networks. Consider the following excerpt from this answer In principle, differentiability is…
hanugm
  • 3,571
  • 3
  • 18
  • 50
5
votes
1 answer

Does the policy iteration convergence hold for finite-horizon MDP?

Most RL books (Sutton & Barto, Bertsekas, etc.) talk about policy iteration for infinite-horizon MDPs. Does the policy iteration convergence hold for finite-horizon MDP? If yes, how can we derive the algorithm?
5
votes
1 answer

Why does Q-learning converge under 100% exploration rate?

I am working on this assignment where I made the agent learn state-action values (Q-values) with Q-learning and 100% exploration rate. The environment is the classic gridworld as shown in the following picture. Here are the values of my…
5
votes
1 answer

When exactly is a model considered over-parameterized?

When exactly is a model considered over-parameterized? There are some recent researches in Deep Learning about the role of over-parameterization toward generalization, so it would be nice if I can know what exactly can be considered as such. A…
5
votes
1 answer

How can we conclude that an optimization algorithm is better than another one

When we test a new optimization algorithm, what the process that we need to do?For example, do we need to run the algorithm several times, and pick a best performance,i.e., in terms of accuracy, f1 score .etc, and do the same for an old optimization…
4
votes
1 answer

How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?

I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…
1
2 3 4 5 6 7