Highest Voted 'mini-batch-gradient-descent' Questions - Artificial Intelligence Stack Exchange

10

votes

2 answers

Is neural networks training done one-by-one?

I'm trying to learn neural networks by watching this series of videos and implementing a simple neural network in Python. Here's one of the things I'm wondering about: I'm training the neural network on sample data, and I've got 1,000 samples. The…

asked May 25 '19 at 05:08

Ram Rachum

261
1
9

9

votes

2 answers

What exactly is averaged when doing batch gradient descent?

I have a question about how the averaging works when doing mini-batch gradient descent. I think I now understood the general gradient descent algorithm, but only for online learning. When doing mini-batch gradient descent, do I have to: forward…

backpropagation gradient-descent feedforward-neural-networks stochastic-gradient-descent mini-batch-gradient-descent

asked Apr 18 '20 at 21:21

Ben

425
3
10

9

votes

1 answer

Is back-propagation applied for each data point or for a batch of data points?

I am new to deep learning and trying to understand the concept of back-propagation. I have a doubt about when the back-propagation is applied. Assume that I have a training data set of 1000 images for handwritten letters, Is back-propagation…

neural-networks backpropagation gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Apr 05 '19 at 08:34

Maanu

235
2
6

3

votes

1 answer

When using experience replay, do we update the parameters for all samples of the mini-batch or for each sample in the mini-batch separately?

I've been reading Google's DeepMind Atari paper and I'm trying to understand how to implement experience replay. Do we update the parameters $\theta$ of function $Q$ once for all the samples of the minibatch, or do we do that for each sample of the…

reinforcement-learning deep-rl dqn experience-replay mini-batch-gradient-descent

asked May 30 '18 at 19:56

user491626

241
1
4

3

votes

2 answers

What is the difference between batch and mini-batch gradient decent?

I am learning deep learning from Andrew Ng's tutorial Mini-batch Gradient Descent. Can anyone explain the similarities and dissimilarities between batch GD and mini-batch GD?

deep-learning comparison gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Mar 28 '20 at 05:03

DRV

1,573
2
11
18

2

votes

2 answers

What's the rationale behind mini-batch gradient descent?

I am reading a book that states As the mini-batch size increases, the gradient computed is closer to the 'true' gradient So, I assume that they are saying that mini-batch training only focuses on decreasing the cost function in a certain 'plane',…

gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Aug 09 '18 at 03:31

ngc1300

133
5

2

votes

1 answer

Why is it called "batch" gradient descent if it consumes the full dataset before calculating the gradient?

While training a neural network, we can follow three methods: batch gradient descent, mini-batch gradient descent and stochastic gradient descent. For this question, assume that your dataset has $n$ training samples and we divided it into $k$…

terminology gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Jul 30 '21 at 23:38

hanugm

3,571
3
18
50

2

votes

1 answer

When is the loss calculated, and when does the back-propagation take place?

I read different articles and keep getting confused on this point. Not sure if the literature is giving mixed information or I'm interpreting it incorrectly. So from reading articles my understanding (loosely) for the following terms are as…

neural-networks deep-learning objective-functions mini-batch-gradient-descent epochs

asked Aug 07 '19 at 01:38

Hazzaldo

279
2
9

1

vote

1 answer

What is the order of execution of steps in back-propagation algorithm in a neural network?

I am a machine learning newbie. I am trying to understand the back-propagation algorithm. I have a training dataset of 60 instances/records. What is the correct order of the process? This one? Forward pass of the first instance. Calculate the…

neural-networks deep-learning backpropagation stochastic-gradient-descent mini-batch-gradient-descent

asked Jun 15 '17 at 14:06

gokul

53
4

1

vote

0 answers

Why would one prefer the gradient of the sum rather than the sum of the gradients?

When gradients are aggregated over mini batches, I sometimes see formulations like this, e.g., in the "Deep Learning" book by Goodfellow et al. $$\mathbf{g} = \frac{1}{m} \nabla_{\mathbf{w}} \left( \sum\limits_{i=1}^{m} L \left( f \left(…

gradient-descent gradient mini-batch-gradient-descent

asked Mar 21 '22 at 09:55

Eddie C

11
1

1

vote

1 answer

Is it possible to use stochastic gradient descent at the beginning, then switch to batch gradient descent with only a few training examples?

Batch gradient descent is extremely slow for large datasets, but it can find the lowest possible value for the cost function. Stochastic gradient descent is relatively fast, but it kind of finds the general area where convergence happens and it kind…

machine-learning gradient-descent stochastic-gradient-descent mini-batch-gradient-descent

asked Nov 03 '21 at 07:41

Robo

121
3

1

vote

2 answers

When would it make sense to perform a gradient descent step for each term of a loss function with multiple terms?

I am training a neural network using a mini-batch gradient descent algorithm. Now, consider the following loss function, which is composed of 2 terms. $$L = L_{\text{MSE}} + L_{\text{regularization}} \label{1}\tag{1}$$ As far as I understand,…

deep-learning objective-functions mini-batch-gradient-descent

asked Oct 13 '21 at 07:39

hanugm

3,571
3
18
50

1

vote

1 answer

How many iterations of the optimisation algorithm are performed on each mini-batch in mini-batch gradient descent?

I understand the idea of mini-batch gradient descent for neural networks in that we calculate the gradient of the loss function using one mini-batch at a time and use this gradient to adjust the parameters. My question is: how many times do we…

neural-networks gradient-descent adam mini-batch-gradient-descent

asked Sep 27 '21 at 15:43

user50018

13
2

1

vote

2 answers

In mini-batch gradient descent, do we pass each input in the batch individually or all inputs at the same time through the layer?

In the stochastic gradient descent algorithm, the weight update happens for every training sample. In the mini-batch gradient descent algorithm, the weight update happens for every batch of training samples. In the batch gradient descent algorithm,…

neural-networks deep-learning gradient-descent implementation mini-batch-gradient-descent

asked Sep 27 '21 at 07:57

hanugm

3,571
3
18
50

1

vote

1 answer

Why does my model not improve when training with mini-batch gradient descent, while it does with Adam?

I am currently experimenting with the U-Net. I am doing semantic segmentation on the 2018 Data Science Bowl dataset from Kaggle without any data augmentation. In my experiments, I am trying different hyper-parameters, like using Adam, mini-batch GD…

u-net batch-normalization adam mini-batch-gradient-descent semantic-segmentation

asked Feb 01 '21 at 13:53

Bert Gayus

545
3
12

Questions tagged [mini-batch-gradient-descent]