Highest Voted 'learning-rate' Questions - Artificial Intelligence Stack Exchange

9

votes

2 answers

Is there an ideal range of learning rate which always gives a good result almost in all problems?

I once read somewhere that there is a range of learning rate within which learning is optimal in almost all the cases, but I can't find any literature about it. All I could get is the following graph from the paper: The need for small learning rates…

asked Feb 22 '21 at 04:43

Sadaf Shafi

208
1
9

9

votes

1 answer

What causes a model to require a low learning rate?

I've pondered this for a while without developing an intuition for the math behind the cause of this. So what causes a model to need a low learning rate?

machine-learning models hyper-parameters learning-rate

asked Mar 31 '19 at 21:07

JohnAllen

217
1
6

8

votes

1 answer

Why is the learning rate generally beneath 1?

In all examples I've ever seen, the learning rate of an optimisation method is always less than $1$. However, I've never found an explanation as to why this is. In addition to that, there are some cases where having a learning rate bigger than 1 is…

machine-learning optimization gradient-descent learning-rate stochastic-gradient-descent

asked Sep 25 '20 at 03:40

Recessive

1,346
8
21

6

votes

1 answer

Should I be decaying the learning rate and the exploration rate in the same manner?

Should I be decaying the learning rate and the exploration rate in the same manner? What's too slow and too fast of an exploration and learning rate decay? Or is it specific from model to model?

reinforcement-learning deep-rl hyper-parameters learning-rate exploration-strategies

asked Sep 11 '18 at 10:47

rtz

91
6

4

votes

2 answers

Is there a way to translate the concept of batch size into reinforcement learning?

I am using a neural network as my function approximator for reinforcement learning. In order to get it to train well, I need to choose a good learning rate. Hand-picking one is difficult, so I read up on methods of programmatically choosing a…

neural-networks deep-learning reinforcement-learning learning-rate batch-size

asked Aug 07 '18 at 20:01

user10011538

41
1
2

4

votes

2 answers

Is stable learning preferable to jumps in accuracy/loss

A stable/smooth learning validation curve often seems to keep improving over more epochs than an unstable learning curve. My intuition is that dropping the learning rate and increasing the patience of a model that produces a stable learning curve…

machine-learning training learning-rate

asked Sep 30 '20 at 11:23

Oliver P

141
2

3

votes

0 answers

Would a different learning rate for every neuron and layer mitigate or solve the vanishing gradient problem?

I'm interested in using the sigmoid (or tanh) activation function instead of RELU. I'm aware of RELU advantages on faster computation and no vanishing gradient problem. But about vanishing gradient, the main problem is about the backpropagation…

deep-learning backpropagation activation-functions learning-rate vanishing-gradient-problem

asked Aug 06 '20 at 08:30

Rogelio Triviño

141
3

3

votes

1 answer

In Q-learning, shouldn't the learning rate change dynamically during the learning phase?

I have the following code (below), where an agent uses Q-learning (RL) to play a simple game. What appears to be questionable for me in that code is the fixed learning rate. When it's set low, it's always favouring the old Q-value over the…

reinforcement-learning python q-learning hyperparameter-optimization learning-rate

asked May 12 '19 at 02:41

Hazzaldo

279
2
9

2

votes

1 answer

Why can the learning rate make the loss increase in stochastic gradient descent?

In Deep Learning by Goodfellow et al., I came across the following line on the chapter on Stochastic Gradient Descent (pg. 287): The main question is how to set $\epsilon_0$. If it is too large, the learning curve will show violent oscillations,…

neural-networks machine-learning deep-learning gradient-descent learning-rate

asked Jan 10 '19 at 03:38

Shrey

204
1
7

2

votes

0 answers

Do learning rate schedulers conflict with or prevent convergence of the Adam optimiser?

An article on https://spell.ml says Because Adam manages learning rates internally, it's incompatible with most learning rate schedulers. Anything more complicated than simple learning warmup and/or decay will put the Adam optimizer to "complete"…

convergence learning-rate adam

asked Mar 31 '22 at 10:58

Jack G

21
3

2

votes

2 answers

How does $\alpha$ affect the convergence of the TD algorithm?

In Temporal-Difference Learning, we update our value function by $V\left(S_{t}\right) \leftarrow V\left(S_{t}\right)+\alpha\left(R_{t+1}+\gamma V\left(S_{t+1}\right)-V\left(S_{t}\right)\right)$ If we choose a constant $\alpha$, will the algorithm…

reinforcement-learning reference-request convergence temporal-difference-methods learning-rate

asked Oct 16 '21 at 11:49

XXX

143
5

2

votes

0 answers

Has the idea of using different learning rates for different layers been explored in the literature?

I wonder whether there are heuristic rules for the optimal selection of learning rates for different layers. I expect that there is no general recipe, but probably there are some choices that may be beneficial. The common strategy uses the same…

deep-learning reference-request hyperparameter-optimization learning-rate layers

asked Aug 19 '21 at 14:23

spiridon_the_sun_rotator

2,454
8
16

2

votes

1 answer

How does the learning rate $\alpha$ vary in stationary and non-stationary environments?

In Sutton and Barto's book (Chapter 6: TD learning, 2nd edition), he mentions two ways of updating value function: Monte Carlo method: $V(S_t) \leftarrow V(S_t) + \alpha[G_t - V(S_t)]$. TD(0) method: $V(S_t) \leftarrow V(S_t) + \alpha[R_{t+1} +…

monte-carlo-methods temporal-difference-methods environment learning-rate

asked Aug 11 '21 at 16:39

user529295

359
1
10

2

votes

0 answers

Why does learning rate reduce train-test generalization gap?

In this blog post: http://www.argmin.net/2016/04/18/bottoming-out/ Prof Recht shows two plots: He says one of the reasons the plot below has a lower train-test gap is because that model was trained with a lower learning rate (and he also manually…

neural-networks machine-learning generalization learning-rate

asked Jul 14 '20 at 07:21

user3180

598
3
14

2

votes

1 answer

Autoencoder network for feature selection not converging

I am training an undercomplete autoencoder network for feature selection. I am using one hidden layer in the encoder and decoder networks each. The ELU activation function is used for each layer. For optimization, I am using the ADAM optimizer.…

autoencoders learning-rate

asked Jan 23 '20 at 18:33

Prishita Ray

51
4

Questions tagged [learning-rate]