负载均衡参数配置

负载均衡参数，可通过修改atb-models安装目录下的“{ATB安装路径}/atb-models/atb_llm/conf/config.json”进行配置。修改“models/deepseekv2/eplb”字段里的“level”、“expert_map_file”、“num_redundant_experts”、"aggregate_threshold"、"buffer_expert_layer_num"、"num_expert_update_ready_countdown"参数，默认配置为不开启负载均衡。典型配置如下：

{
    "models": {
        "deepseekv2": {
            "eplb": {
                "level": 1,
                "expert_map_file": "xxxx.json",
                "num_redundant_experts": 0,
                 "aggregate_threshold": 2048, 
                 "buffer_expert_layer_num": 58, 
                 "num_expert_update_ready_countdown": 50
            }
        }
    }
}

参数说明如下：

配置项	取值类型	取值范围	配置说明
level	int	[0, 3]	0 : 不开启负载均衡 1 : 开启静态冗余负载均衡 2 : 开启动态冗余负载均衡（暂不支持） 3 : 开启强制负载均衡默认值：0
expert_map_file	string	该文件路径存在	静态冗余负载专家部署表路径。默认值：""
num_redundant_experts	int	[0, n_routed_experts]	当前版本暂不支持该参数。表示冗余专家的个数。默认值：0
aggregate_threshold	int	≥1	当前版本暂不支持该参数。表示动态EPLB算法触发的频率，单位是decode次数。例如：50表示50次decode，触发一次动态EPLB算法，若算法认为热度超过一定阈值时，则调整路由表来降低算法热度。
buffer_expert_layer_num	int	[1, num_moe_layers]	当前版本暂不支持该参数。表示动态EPLB每次搬运的layer个数。由于权重搬运为异步搬运，在不影响原decode情况下，需要一个额外的buffer内存来存放被搬运中的新权重，配置为1层时，则为一次只搬运一层，然后刷新掉一层layer的权重和路由表。影响的内存公式为：buffer_expert_layer_numlocal_experts_num44M (44M为一个int8的专家大小)
num_expert_update_ready_countdown	int	≥1	当前版本暂不支持该参数。表示检查host->device搬运是否结束的频率，单位为decode次数。因为搬运权重为异步搬运，必须所有ep卡搬运完毕后才能刷新权重和路由表，这里引入了通信，在搬运层较多的情况下，可以降低该频率，从而减少EPLB框架侧开销。

父主题： 使用说明