PD分离部署经典配置参数
目前仅支持配置经典配置参数,根据填写的max_seq_len最大序列长度参数(对应下表中maxSeqLen参数)和D实例个数自动填充其他参数。
使用Atlas 800I A2 推理服务器时的经典配置参数
参数类型 |
参数名 |
上下文序列长度 |
||
|---|---|---|---|---|
16k |
64k |
128k |
||
P实例参数(mindie_server_prefill_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
2 |
1 |
1 |
|
cp |
1 |
2 |
2 |
|
tp |
8 |
8 |
8 |
|
sp |
1 |
8 |
8 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
4 |
16 |
16 |
|
moe_tp |
4 |
1 |
1 |
|
ep_level |
1 |
1 |
1 |
|
MTP |
开启 |
开启 |
关闭 |
|
enable_init_routing_cutoff |
false |
true |
true |
|
topk_scaling_factor |
不生效 |
0.25 |
0.25 |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
D实例参数(mindie_server_decode_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
D实例为4节点:32 D实例为8节点:64 |
D实例为4节点:32 D实例为8节点:64 |
D实例为4节点:32 D实例为8节点:64 |
|
tp |
1 |
1 |
1 |
|
sp |
1 |
1 |
1 |
|
cp |
1 |
1 |
1 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
D实例为4节点:32 D实例为8节点:64 |
D实例为4节点:32 D实例为8节点:64 |
D实例为4节点:32 D实例为8节点:64 |
|
moe_tp |
1 |
1 |
1 |
|
ep_level |
2 |
2 |
2 |
|
MTP |
开启 |
开启 |
关闭 |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
maxIterTimes |
18000 |
68000 |
134000 |
|
节点配置 |
PD分离配置 |
交换机选型参考 |
|---|---|---|
8台+2台+1台热设备 |
2*2P+1*4D+2台(双机)+1台在线热备份 |
交换机规格参考:32*400G,如XH9210:Leaf 3台,Spine 2台 |
16台+1台热设备 |
4*2P+2*4D+1台在线热备份 |
交换机规格参考:32*400G,如XH9210:Leaf 5台 ,Spine 4台 |
N*16台 |
N*(4*2P+1*8D) 按16节点最佳性能(EP64)线性扩展 |
交换机规格参考:32*400G,如XH9210:Leaf 32台 ,Spine 16台以N=8,共1024NPU为例 |
大规模专家并行方案采用的Atlas 800I A2 推理服务器仅支持Atlas 800I A2 推理服务器(64GB HCCS款),且NPU片上内存必须为64G,NPU网口光模块必须为200G。
使用Atlas 800I A3 超节点服务器时的经典配置参数
参数类型 |
参数名 |
上下文序列长度 |
||
|---|---|---|---|---|
16k |
64k |
128k |
||
P实例参数(mindie_server_prefill_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
2 |
1 |
1 |
|
cp |
1 |
2 |
2 |
|
tp |
8 |
8 |
8 |
|
sp |
1 |
8 |
8 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
16 |
16 |
16 |
|
moe_tp |
1 |
1 |
1 |
|
ep_level |
2 |
2 |
2 |
|
MTP |
开启 |
开启 |
关闭 |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
D实例参数(mindie_server_decode_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
D实例为4节点:64 D实例为8节点:128 |
D实例为4节点:64 D实例为8节点:128 |
D实例为4节点:64 D实例为8节点:128 |
|
tp |
1 |
1 |
1 |
|
sp |
1 |
1 |
1 |
|
cp |
1 |
1 |
1 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
D实例为4节点:64 D实例为8节点:128 |
D实例为4节点:64 D实例为8节点:128 |
D实例为4节点:64 D实例为8节点:128 |
|
moe_tp |
1 |
1 |
1 |
|
ep_level |
2 |
2 |
2 |
|
MTP |
开启 |
开启 |
关闭 |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
maxIterTimes |
18000 |
68000 |
134000 |
|
节点配置 |
PD分离配置 |
总线网络交换机(L2)数量 |
|---|---|---|
8台+1台A3冗余节点(可选) |
4*1P+2*2D+1台A3冗余节点(可选) |
14 |
16台+1台A3冗余节点(可选) |
8*1P+2*4D+1台A3冗余节点(可选) |
28 |
32台+1台A3冗余节点(可选) |
16*1P+4*4D+1台A3冗余节点(可选) |
56 |
48台 |
24*1P+6*4D |
56 |
N*48台 |
N*(24*1P+6*4D) |
N*56 |
证书配置(可选)
证书的参数配置文件存放于$HOME/ascend-deployer/ascend_deployer/group_vars/master/tls_config.yaml,文件内容示例如下。
# group_vars/tls_config.yaml
tls_config:
tls_enable: false
kmc_ksf_master: "./security/master/tools/pmt/master/ksfa"
kmc_ksf_standby: "./security/standby/tools/pmt/standby/ksfb"
infer_tls_items:
ca_cert: "./security/infer/security/certs/ca.pem"
tls_cert: "./security/infer/security/certs/cert.pem"
tls_key: "./security/infer/security/keys/cert.key.pem"
tls_passwd: "./security/infer/security/pass/key_pwd.txt"
tls_crl: "infer"
management_tls_items:
ca_cert: "./security/management/security/certs/ca.pem"
tls_cert: "./security/management/security/certs/cert.pem"
tls_key: "./security/management/security/keys/cert.key.pem"
tls_passwd: "./security/management/security/pass/key_pwd.txt"
tls_crl: "management"
# Atlas 800I A2 推理服务器的场景不需要配置 ccae_tls_enable 和 ccae_tls_items
ccae_tls_enable: false
ccae_tls_items:
ca_cert: "./security/ccae/security/certs/ca.pem"
tls_cert: "./security/ccae/security/certs/cert.pem"
tls_key: "./security/ccae/security/keys/cert.key.pem"
tls_passwd: "./security/ccae/security/pass/key_pwd.txt"
tls_crl: "ccae"
cluster_tls_enable: false
cluster_tls_items:
ca_cert: "./security/clusterd/security/certs/ca.pem"
tls_cert: "./security/clusterd/security/certs/cert.pem"
tls_key: "./security/clusterd/security/keys/cert.key.pem"
tls_passwd: "./security/clusterd/security/pass/key_pwd.txt"
tls_crl: "clusterd"
etcd_server_tls_enable: false
etcd_server_tls_items:
ca_cert: "./security/etcd_server/security/certs/ca.pem"
tls_cert: "./security/etcd_server/security/certs/cert.pem"
tls_key: "./security/etcd_server/security/keys/cert.key.pem"
tls_passwd: "./security/etcd_server/security/pass/key_pwd.txt"
kmc_ksf_master: "./security/etcd_server/tools/pmt/master/ksfa"
kmc_ksf_standby: "./security/etcd_server/tools/pmt/standby/ksfb"
tls_crl: ""