[Notes] List of Papers

2021. 4. 3. 16:47ㆍNotes

πŸ“Œ Systems

  • Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
  • ZeRO-offload: Democratizing Billion-Scale Model Training
  • Bandwidth Efficient All-reduce Operation on Tree Topologies
  • Efficient Barrier and AllReduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms
  • Scaling Distributed Machine Learning with the Parameter Server
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
  • GPipe: Easy-Scaling with Mircro-Batch Pipeline Parallelism
  • ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
  • Parallelized Stochastic Gradient Descent
  • Measuring the effects of Data Parallelism on Neural Network Training
  • PipeDream: Fast and Efficient Pipeline Parallel DNN Training
  • Training Deep Nets with Sublinear Memory Cost

 

πŸ“Œ Frameworks

  • TensorFlow: A System for Large-Scale Machine Learning

 

πŸ“Œ Models & Deep Learning

  • Attention is All You Need
  • Improving Language Understanding by Generative Pre-Training
  • Language Models are Unsupervised Multitask Learners
  • Language Models are Few-Shot Learners
  • Adam: A method for stochastic optimization
  • Gaussian Error Linear Units (GeLUs)