Highest Voted 'stochastic-gradient-descent' Questions - Artificial Intelligence Stack Exchange

23

votes

3 answers

How do I choose the optimal batch size?

Batch size is a term used in machine learning and refers to the number of training examples utilised in one iteration. The batch size can be one of three options: batch mode: where the batch size is equal to the total dataset thus making the…

asked Oct 21 '18 at 17:09

Sebastian Nielsen

363
1
2
10

10

votes

2 answers

How do I handle negative rewards in policy gradients with the cross-entropy loss function?

I am using policy gradients in my reinforcement learning algorithm, and occasionally my environment provides a severe penalty (i.e. negative reward) when a wrong move is made. I'm using a neural network with stochastic gradient descent to learn the…

reinforcement-learning policy-gradients rewards cross-entropy stochastic-gradient-descent

asked Nov 29 '16 at 06:10

jstaker7

209
1
2
5

10

votes

1 answer

What is the relationship between gradient accumulation and batch size?

I am currently training some models using gradient accumulation since the model batches do not fit in GPU memory. Since I am using gradient accumulation, I had to tweak the training configuration a bit. There are two parameters that I tweaked: the…

comparison gradient-descent stochastic-gradient-descent gradient batch-size

asked Jun 17 '20 at 15:58

JVGD

1,088
1
6
14

10

votes

2 answers

Is neural networks training done one-by-one?

I'm trying to learn neural networks by watching this series of videos and implementing a simple neural network in Python. Here's one of the things I'm wondering about: I'm training the neural network on sample data, and I've got 1,000 samples. The…

neural-networks deep-learning gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked May 25 '19 at 05:08

Ram Rachum

261
1
9

9

votes

2 answers

What exactly is averaged when doing batch gradient descent?

I have a question about how the averaging works when doing mini-batch gradient descent. I think I now understood the general gradient descent algorithm, but only for online learning. When doing mini-batch gradient descent, do I have to: forward…

backpropagation gradient-descent feedforward-neural-networks stochastic-gradient-descent mini-batch-gradient-descent

asked Apr 18 '20 at 21:21

Ben

425
3
10

9

votes

1 answer

Is back-propagation applied for each data point or for a batch of data points?

I am new to deep learning and trying to understand the concept of back-propagation. I have a doubt about when the back-propagation is applied. Assume that I have a training data set of 1000 images for handwritten letters, Is back-propagation…

neural-networks backpropagation gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Apr 05 '19 at 08:34

Maanu

235
2
6

8

votes

1 answer

Why is the learning rate generally beneath 1?

In all examples I've ever seen, the learning rate of an optimisation method is always less than $1$. However, I've never found an explanation as to why this is. In addition to that, there are some cases where having a learning rate bigger than 1 is…

machine-learning optimization gradient-descent learning-rate stochastic-gradient-descent

asked Sep 25 '20 at 03:40

Recessive

1,346
8
21

4

votes

0 answers

How does SGD escape local minima?

SGD is able to jump out of local minima that would otherwise trap BGD I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic Gradient Descent) is able to escape local minima,…

optimization gradient-descent stochastic-gradient-descent

asked May 31 '20 at 09:56

stoic-santiago

1,121
5
18

3

votes

1 answer

How are these equations of SGD with momentum equivalent?

I know this question may be so silly, but I can not prove it. In Stanford slide (page 17), they define the formula of SGD with momentum like this: $$ v_{t}=\rho v_{t-1}+\nabla f(x_{t-1}) \\ x_{t}=x_{t-1}-\alpha v_{t}, $$ where: $v_{t+1}$ is the…

deep-learning comparison optimization stochastic-gradient-descent momentum

asked Dec 13 '20 at 05:56

CuCaRot

892
3
15

3

votes

1 answer

Should we also shuffle the test dataset when training with SGD?

When training machine learning models (e.g. neural networks) with stochastic gradient descent, it is common practice to (uniformly) shuffle the training data into batches/sets of different samples from different classes. Should we also shuffle the…

machine-learning training datasets stochastic-gradient-descent testing

asked Nov 03 '20 at 04:45

SpiderRico

960
8
18

3

votes

2 answers

What is the difference between batch and mini-batch gradient decent?

I am learning deep learning from Andrew Ng's tutorial Mini-batch Gradient Descent. Can anyone explain the similarities and dissimilarities between batch GD and mini-batch GD?

deep-learning comparison gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Mar 28 '20 at 05:03

DRV

1,573
2
11
18

2

votes

2 answers

What's the rationale behind mini-batch gradient descent?

I am reading a book that states As the mini-batch size increases, the gradient computed is closer to the 'true' gradient So, I assume that they are saying that mini-batch training only focuses on decreasing the cost function in a certain 'plane',…

gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Aug 09 '18 at 03:31

ngc1300

133
5

2

votes

1 answer

Is there any way to train a neural network without using gradients?

The only algorithm I know for updation of weights of a neural network is based on gradients. The update equation can be roughly written as $$w \leftarrow w - \nabla_{w}L$$ where $\nabla_{w}L$ is the gradient of loss function with respect to…

neural-networks training reference-request stochastic-gradient-descent

asked Aug 18 '21 at 14:45

hanugm

3,571
3
18
50

2

votes

1 answer

Why is it called "batch" gradient descent if it consumes the full dataset before calculating the gradient?

While training a neural network, we can follow three methods: batch gradient descent, mini-batch gradient descent and stochastic gradient descent. For this question, assume that your dataset has $n$ training samples and we divided it into $k$…

terminology gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Jul 30 '21 at 23:38

hanugm

3,571
3
18
50

2

votes

0 answers

Methodologies for passing the best samples for a neural network to learn

Just an idea I am sure I read in a book some time ago, but I can't remember the name. Given a very large dataset and a neural network (or anything that can learn via something like stochastic gradient descent, passing a subset of samples to modify…

neural-networks stochastic-gradient-descent batch-learning

asked Jul 23 '21 at 15:59

user4052054

121
1

Questions tagged [stochastic-gradient-descent]