[Papers] ZeRO-Offload: Democratizing Billion-Scale Model Training
[Link to Paper] ZeRO-Offload PAPER SUMMARY PROBLEM Training large models requires having enough GPU devices so that the GPU memory can hold model states for training (despite pipeline parallelism, model parallelism ...) Using a lot of GPUs is cost-burdening, making it difficult for people to attempt training SOLUTION Democratize large model training by ZeRO-Offload, which exploits both CPU memor..
2021.04.03