2

I am trying to figure out how multiprocessing works in neural networks.

In the example I've seen, the database is split into $x$ parts (depending on how many workers you have) and each worker is responsible to train the network using a different part of the database.

I am confused regarding the optimization part:

Let's say worker 1 finished first to calculate the gradient, now it will update the network accordingly.

Then worker 2 finished the calculation and it will also attempt to update the weights. However, the gradient it calculated was for the network before it was updated by the first worker. Now, the second worker will attempt to update the network with a bad gradient.

Did I miss something?

nbro
  • 39,006
  • 12
  • 98
  • 176
Yedidya kfir
  • 121
  • 1
  • depending on your ML framework (e.g, TF, pytorch), beside the docs they have pretty nice explanation on several of their ways of distributed training. ANW, check out the Hogwild! training https://www.youtube.com/watch?v=l5JqUvTdZts – Sanyou Oct 19 '21 at 13:33
  • 1
    multiprocessing or gpus allows you to process several batches at the same time, but indeed gradients and optimizer can be updated only after all relevant batches have been processed. So usually one process or one gpu is reserved for this specific task. Pytorch for example collect all gradients always in cuda0 (the first, or the only, gpu available). – Edoardo Guerriero Oct 19 '21 at 15:00
  • If you think about it this is isn't too different from how non-multiprocessing training works - we calculate the gradient for a whole lot of data items, without updating the network, then we update the network with all the gradients at once! – user253751 Oct 22 '21 at 09:31
  • @Sanyou This is an important question, although it's quite general, given that there may be different ways to do this. If you can provide a formal answer, I would suggest that you do so, because it may be useful to other people in the future too. – nbro Oct 22 '21 at 13:04
  • @EdoardoGuerriero I wanted to write the comment above by also mentioning you, but I couldn't do that. It seems that you have some useful information that at least partially answers the question. Maybe you can elaborate on what you wrote above to also write a formal answer below. – nbro Oct 22 '21 at 13:05

0 Answers0