What is the order of execution of steps in back-propagation algorithm in a neural network?

Question

I am a machine learning newbie. I am trying to understand the back-propagation algorithm. I have a training dataset of 60 instances/records.

What is the correct order of the process? This one?

Forward pass of the first instance. Calculate the error.
Weight update using backpropagation.
Forward pass of the second instance. Calculate the error.
Weight update using backpropagation. And so on...

Or

Forward pass of all instances one by one. Keep track of the error as a vector.
Weight update using backpropagation.

This video https://www.youtube.com/watch?v=OwdjRYUPngE is similar to the second process. Is it correct?

I think the video you watched is an example of _batch training_. Meaning that you activate an x amount of samples from your set, and you calculate the error over those samples and then change the weights. So you change the weights every x samples. However, if you are a newbie, you should first focus on _online learning_: activating 1 sample, then backpropagating and repeating that for every sample in the set. — Thomas Wagenaar, Jun 15 '17 at 14:17

score 0 · Accepted Answer · edited Dec 10 '20 at 18:29

Both are feasible.

A generalization is to apply the modification of the weights after the presentation of $n$ examples. If $n=1$, this is online training. If $n=60$ ($60$, in your case, is the size of the dataset), this is batch training. In other cases, this is mini-batch training.

The main difference is not the computation complexity of the algorithm, but the theoretical speed of convergence to an "optimal" set of weights.

It is now generally admitted that online training is faster. Online training performs a stochastic approximation of the gradient descent method. The true gradient of the cost function over the whole training set is approximated by a gradient at each presentation of a training example.

Intuitively, with batch learning the weights are modified by an average of the gradient over the whole training set and averaging destroys information.

While with stochastic (or mini-batch learning), each example has his word to say. One after the other.

This has also been discussed here.

You say "It is now generally admitted that online training is faster", but I am not sure. Faster in terms of convergence? Can you provide a reference (i.e. a reliable paper or book) that supports that claim? — nbro, Dec 10 '20 at 18:33

What is the order of execution of steps in back-propagation algorithm in a neural network?

1 Answers1