Prefill-Decode Disaggregation Parameters
Currently, only typical parameters can be configured. Other parameters are automatically set based on the value of max_seq_len (corresponding to maxSeqLen in the following table) and the number of D instances.
Typical Parameters for the Atlas 800I A2 Inference Server
Parameter Type |
Parameter |
Context Sequence Length |
||
|---|---|---|---|---|
16K |
64K |
128K |
||
P instance parameters (mindie_server_prefill_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
2 |
1 |
1 |
|
cp |
1 |
2 |
2 |
|
tp |
8 |
8 |
8 |
|
sp |
1 |
8 |
8 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
4 |
16 |
16 |
|
moe_tp |
4 |
1 |
1 |
|
ep_level |
1 |
1 |
1 |
|
MTP |
On |
On |
Off |
|
enable_init_routing_cutoff |
false |
true |
true |
|
topk_scaling_factor |
Ineffective |
0.25 |
0.25 |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
D instance parameters (mindie_server_decode_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
4-node D instance: 32 8-node D instance: 64 |
4-node D instance: 32 8-node D instance: 64 |
4-node D instance: 32 8-node D instance: 64 |
|
tp |
1 |
1 |
1 |
|
sp |
1 |
1 |
1 |
|
cp |
1 |
1 |
1 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
4-node D instance: 32 8-node D instance: 64 |
4-node D instance: 32 8-node D instance: 64 |
4-node D instance: 32 8-node D instance: 64 |
|
moe_tp |
1 |
1 |
1 |
|
ep_level |
2 |
2 |
2 |
|
MTP |
On |
On |
Off |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
maxIterTimes |
18000 |
68000 |
134000 |
|
Node Configuration |
P/D Disaggregation Configuration |
Switch Selection Reference |
|---|---|---|
8 + 2 + 1 |
2 x 2P + 1 x 4D + 2 (two-node cluster) + 1 (online hot backup) |
Switch specifications: 32 x 400GE, for example, XH9210: three leaf switches and two spine switches |
16 + 1 |
4 x 2P + 2 x 4D + 1 (online hot backup) |
Switch specifications: 32 x 400GE, for example, XH9210: five leaf switches and four spine switches |
N x 16 |
N x (4 x 2P + 1 x 8D) Linear expansion based on the optimal 16-node performance (EP64) |
Switch specifications: 32 x 400GE, for example, XH9210: 32 leaf switches and 16 spine switches (taking N = 8 and a total of 1024 NPUs as an example) |
The Atlas 800I A2 inference server used in the MoE EP solution can only be the Atlas 800I A2 inference server (64 GB HCCS), the NPU on-chip memory must be 64 GB, and the optical module of the NPU network port must be 200 GE.
Typical Parameters for the Atlas 800I A3 SuperPoD Server
Parameter Type |
Parameter |
Context Sequence Length |
||
|---|---|---|---|---|
16K |
64K |
128K |
||
P instance parameters (mindie_server_prefill_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
2 |
1 |
1 |
|
cp |
1 |
2 |
2 |
|
tp |
8 |
8 |
8 |
|
sp |
1 |
8 |
8 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
16 |
16 |
16 |
|
moe_tp |
1 |
1 |
1 |
|
ep_level |
2 |
2 |
2 |
|
MTP |
On |
On |
Off |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
D instance parameters (mindie_server_decode_config) |
maxSeqLen |
18000 |
68000 |
134000 |
maxInputTokenLen |
18000 |
68000 |
134000 |
|
dp |
4-node D instance: 64 8-node D instance: 128 |
4-node D instance: 64 8-node D instance: 128 |
4-node D instance: 64 8-node D instance: 128 |
|
tp |
1 |
1 |
1 |
|
sp |
1 |
1 |
1 |
|
cp |
1 |
1 |
1 |
|
pp |
1 |
1 |
1 |
|
moe_ep |
4-node D instance: 64 8-node D instance: 128 |
4-node D instance: 64 8-node D instance: 128 |
4-node D instance: 64 8-node D instance: 128 |
|
moe_tp |
1 |
1 |
1 |
|
ep_level |
2 |
2 |
2 |
|
MTP |
On |
On |
Off |
|
maxPrefillTokens |
18000 |
68000 |
134000 |
|
maxIterTimes |
18000 |
68000 |
134000 |
|
Node Configuration |
P/D Disaggregation Configuration |
Number of Bus Network Switches (L2) |
|---|---|---|
8 + 1 (optional) |
4 x 1P + 2 x 2D + 1 (A3 redundant node, optional) |
14 |
16 + 1 (optional) |
8 x 1P + 2 x 4D + 1 (A3 redundant node, optional) |
28 |
32 + 1 (optional) |
16 x 1P + 4 x 4D + 1 (A3 redundant node, optional) |
56 |
48 |
24*1P+6*4D |
56 |
N x 48 |
N x (24 x 1P + 6 x 4D) |
N x 56 |
(Optional) Certificate Configuration
The certificate parameter configuration file is stored in $HOME/ascend-deployer/ascend_deployer/group_vars/master/tls_config.yaml. The following is an example of the file content:
# group_vars/tls_config.yaml
tls_config:
tls_enable: false
kmc_ksf_master: "./security/master/tools/pmt/master/ksfa"
kmc_ksf_standby: "./security/standby/tools/pmt/standby/ksfb"
infer_tls_items:
ca_cert: "./security/infer/security/certs/ca.pem"
tls_cert: "./security/infer/security/certs/cert.pem"
tls_key: "./security/infer/security/keys/cert.key.pem"
tls_passwd: "./security/infer/security/pass/key_pwd.txt"
tls_crl: "infer"
management_tls_items:
ca_cert: "./security/management/security/certs/ca.pem"
tls_cert: "./security/management/security/certs/cert.pem"
tls_key: "./security/management/security/keys/cert.key.pem"
tls_passwd: "./security/management/security/pass/key_pwd.txt"
tls_crl: "management"
# ccae_tls_enable and ccae_tls_items do not need to be set in the Atlas 800I A2 inference server scenario.
ccae_tls_enable: false
ccae_tls_items:
ca_cert: "./security/ccae/security/certs/ca.pem"
tls_cert: "./security/ccae/security/certs/cert.pem"
tls_key: "./security/ccae/security/keys/cert.key.pem"
tls_passwd: "./security/ccae/security/pass/key_pwd.txt"
tls_crl: "ccae"
cluster_tls_enable: false
cluster_tls_items:
ca_cert: "./security/clusterd/security/certs/ca.pem"
tls_cert: "./security/clusterd/security/certs/cert.pem"
tls_key: "./security/clusterd/security/keys/cert.key.pem"
tls_passwd: "./security/clusterd/security/pass/key_pwd.txt"
tls_crl: "clusterd"
etcd_server_tls_enable: false
etcd_server_tls_items:
ca_cert: "./security/etcd_server/security/certs/ca.pem"
tls_cert: "./security/etcd_server/security/certs/cert.pem"
tls_key: "./security/etcd_server/security/keys/cert.key.pem"
tls_passwd: "./security/etcd_server/security/pass/key_pwd.txt"
kmc_ksf_master: "./security/etcd_server/tools/pmt/master/ksfa"
kmc_ksf_standby: "./security/etcd_server/tools/pmt/standby/ksfb"
tls_crl: ""