[Notes] List of Papers
📌 Systems Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks ZeRO-offload: Democratizing Billion-Scale Model Training Bandwidth Efficient All-reduce Operation on Tree Topologies Efficient Barrier and AllReduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms Scaling Distributed Machine Learning with the Parameter Server Megatron-LM: Training Multi-Bill..
2021.04.03