当前MindSpeed的通信域Buffer,只能通过环境变量HCCL_BUFFSIZE进行统一设置(默认为200M),但往往不同通信域所需的Buffer大小不能一概而论,具体请参见《CANN 环境变量参考》中的“HCCL_BUFFSIZE”章节。
显存不足,需要降低显存占用的场景可以开启该特性。
--hccl-group-buffer-adaptive
--hccl-group-buffer
["dp", "dp_cp", "cp", "mp", "mp_exp", "tp", "pp", "embd", "tp_dp_cp", "tp_dp", "tp_cp", "tp_exp", "exp", "dp_modulo_exp", "pp_new_stream", "cp2", "cp_ulysses", "cp_ring","cp_ring_intra", "cp_ring_intra_overlap", "nd1_dim1", "ag_x_sd_rcv_overlap", "nd1_dim2", "ag_y_sd_rcv_overlap", "nd2_dim1", "nd2_dim2"]
LLaMA系列模型,开启自适应方案,性能不下降的同时可以节约显存;MoE相关模型,开启自适应方案并设置合适的负载不均衡系数,性能不下降的同时可以节约显存。