Questions tagged [distributed-computing]
7 questions
5
votes
1 answer
Why do we average gradients and not loss in distributed training?
I'm running some distributed trainings in Tensorflow with Horovod. It runs training separately on multiple workers, each of which uses the same weights and does forward pass on unique data. Computed gradients are averaged within the communicator…

pSoLT
- 161
- 2
4
votes
2 answers
How can we use crowdsourcing for deep learning?
Most companies dealing with deep learning (automotive - Comma.ai, Mobileye, various automakers, etc.) do collect large amounts of data to learn from and then use lots of computational power to train a neural network (NN) from such big data. I guess…

Kozuch
- 271
- 1
- 6
2
votes
2 answers
Why do LLMs need massive distributed training across nodes -- if the models fit in one GPU while batch decreases the variance of gradients?
Why do large language models (LLMs) need massive distributed training across nodes -- if the models fit in one GPU and larger batch only decreases the variance of gradients?
tldr: assuming for models that don't need sharding across nodes, why do we…

Charlie Parker
- 161
- 2
- 5
2
votes
1 answer
In how few updates can a multi layer neural net be trained?
A single iteration of gradient descent can be parallelised across many worker nodes. We simple split the training set across the worker nodes, pass the parameters to each worker, each worker computes gradients for their subset of the training set,…

is8ac
- 41
- 2
1
vote
1 answer
Why can't we train neural networks in a peer-to-peer manner?
I have recently been exposed to the concept of decentralized applications,
I know that neural networks require a lot of parallel computing infra for training.
What are the technical difficulties one may face for training neural networks in a p2p…
ram bharadwaj
1
vote
0 answers
Do I need to maintain a separate population in each distributed environment when implementing PBT in a MARL context?
I have questions regarding on how to implement PBT as described in Algorithm 1 (on page 5) in the paper, Population Based Training of Neural Networks to train agents in a MARL (multi-agent reinforcement learning) environment.
In a single agent RL…

Huan
- 161
- 1
- 6
0
votes
0 answers
How to get optimal Scaling with raw PyTorch+DDP?
I'm trying to install a distributed training environment on a compute cluster that I have. I happen to know from previous experience that often scaling up the batch size "naively" isn't very useful; my experience matches the motivation behind AdaSum…

profPlum
- 360
- 1
- 9