[Notes] List of Papers

[Notes] List of Papers

2021. 4. 3. 16:47ㆍNotes

Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
ZeRO-offload: Democratizing Billion-Scale Model Training
Bandwidth Efficient All-reduce Operation on Tree Topologies
Efficient Barrier and AllReduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms
Scaling Distributed Machine Learning with the Parameter Server
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
GPipe: Easy-Scaling with Mircro-Batch Pipeline Parallelism
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Parallelized Stochastic Gradient Descent
Measuring the effects of Data Parallelism on Neural Network Training
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
Training Deep Nets with Sublinear Memory Cost

Welcome To The Show