实际观测32GPU vs 64GPU,基于deepspeed训练Qwen3-32B模型12h

实际观测32GPU vs 64GPU,基于deepspeed训练Qwen3-32B模型12h

数据集:https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-zh

32GPU

“train_batch_size”: 256,
“train_micro_batch_size_per_gpu”: 2,
“gradient_accumulation_steps”: 4,

一个step用14s左右,epoch训到了17

64GPU

“train_batch_size”: 512,
“train_micro_batch_size_per_gpu”: 2,
“gradient_accumulation_steps”: 4,

一个step用16s左右,epoch训到了29