Prefill-Decode Disaggregation Parameters

Currently, only typical parameters can be configured. Other parameters are automatically set based on the value of max_seq_len (corresponding to maxSeqLen in the following table) and the number of D instances.

Typical Parameters for the Atlas 800I A2 Inference Server

Table 1 Parameters

Parameter Type

Parameter

Context Sequence Length

16K

64K

128K

P instance parameters (mindie_server_prefill_config)

maxSeqLen

18000

68000

134000

maxInputTokenLen

18000

68000

134000

dp

2

1

1

cp

1

2

2

tp

8

8

8

sp

1

8

8

pp

1

1

1

moe_ep

4

16

16

moe_tp

4

1

1

ep_level

1

1

1

MTP

On

On

Off

enable_init_routing_cutoff

false

true

true

topk_scaling_factor

Ineffective

0.25

0.25

maxPrefillTokens

18000

68000

134000

D instance parameters (mindie_server_decode_config)

maxSeqLen

18000

68000

134000

maxInputTokenLen

18000

68000

134000

dp

4-node D instance: 32

8-node D instance: 64

4-node D instance: 32

8-node D instance: 64

4-node D instance: 32

8-node D instance: 64

tp

1

1

1

sp

1

1

1

cp

1

1

1

pp

1

1

1

moe_ep

4-node D instance: 32

8-node D instance: 64

4-node D instance: 32

8-node D instance: 64

4-node D instance: 32

8-node D instance: 64

moe_tp

1

1

1

ep_level

2

2

2

MTP

On

On

Off

maxPrefillTokens

18000

68000

134000

maxIterTimes

18000

68000

134000

Table 2 Typical configurations

Node Configuration

P/D Disaggregation Configuration

Switch Selection Reference

8 + 2 + 1

2 x 2P + 1 x 4D + 2 (two-node cluster) + 1 (online hot backup)

Switch specifications: 32 x 400GE, for example, XH9210: three leaf switches and two spine switches

16 + 1

4 x 2P + 2 x 4D + 1 (online hot backup)

Switch specifications: 32 x 400GE, for example, XH9210: five leaf switches and four spine switches

N x 16

N x (4 x 2P + 1 x 8D)

Linear expansion based on the optimal 16-node performance (EP64)

Switch specifications: 32 x 400GE, for example, XH9210: 32 leaf switches and 16 spine switches (taking N = 8 and a total of 1024 NPUs as an example)

The Atlas 800I A2 inference server used in the MoE EP solution can only be the Atlas 800I A2 inference server (64 GB HCCS), the NPU on-chip memory must be 64 GB, and the optical module of the NPU network port must be 200 GE.

Typical Parameters for the Atlas 800I A3 SuperPoD Server

Table 3 Parameters

Parameter Type

Parameter

Context Sequence Length

16K

64K

128K

P instance parameters (mindie_server_prefill_config)

maxSeqLen

18000

68000

134000

maxInputTokenLen

18000

68000

134000

dp

2

1

1

cp

1

2

2

tp

8

8

8

sp

1

8

8

pp

1

1

1

moe_ep

16

16

16

moe_tp

1

1

1

ep_level

2

2

2

MTP

On

On

Off

maxPrefillTokens

18000

68000

134000

D instance parameters (mindie_server_decode_config)

maxSeqLen

18000

68000

134000

maxInputTokenLen

18000

68000

134000

dp

4-node D instance: 64

8-node D instance: 128

4-node D instance: 64

8-node D instance: 128

4-node D instance: 64

8-node D instance: 128

tp

1

1

1

sp

1

1

1

cp

1

1

1

pp

1

1

1

moe_ep

4-node D instance: 64

8-node D instance: 128

4-node D instance: 64

8-node D instance: 128

4-node D instance: 64

8-node D instance: 128

moe_tp

1

1

1

ep_level

2

2

2

MTP

On

On

Off

maxPrefillTokens

18000

68000

134000

maxIterTimes

18000

68000

134000

Table 4 Typical configurations

Node Configuration

P/D Disaggregation Configuration

Number of Bus Network Switches (L2)

8 + 1 (optional)

4 x 1P + 2 x 2D + 1 (A3 redundant node, optional)

14

16 + 1 (optional)

8 x 1P + 2 x 4D + 1 (A3 redundant node, optional)

28

32 + 1 (optional)

16 x 1P + 4 x 4D + 1 (A3 redundant node, optional)

56

48

24*1P+6*4D

56

N x 48

N x (24 x 1P + 6 x 4D)

N x 56

(Optional) Certificate Configuration

The certificate parameter configuration file is stored in $HOME/ascend-deployer/ascend_deployer/group_vars/master/tls_config.yaml. The following is an example of the file content:

# group_vars/tls_config.yaml
tls_config:
  tls_enable: false
  kmc_ksf_master: "./security/master/tools/pmt/master/ksfa"
  kmc_ksf_standby: "./security/standby/tools/pmt/standby/ksfb"
  infer_tls_items:
    ca_cert: "./security/infer/security/certs/ca.pem"
    tls_cert: "./security/infer/security/certs/cert.pem"
    tls_key: "./security/infer/security/keys/cert.key.pem"
    tls_passwd: "./security/infer/security/pass/key_pwd.txt"
    tls_crl: "infer"
  management_tls_items:
    ca_cert: "./security/management/security/certs/ca.pem"
    tls_cert: "./security/management/security/certs/cert.pem"
    tls_key: "./security/management/security/keys/cert.key.pem"
    tls_passwd: "./security/management/security/pass/key_pwd.txt"
    tls_crl: "management"

  # ccae_tls_enable and ccae_tls_items do not need to be set in the Atlas 800I A2 inference server scenario.
  ccae_tls_enable: false
  ccae_tls_items:
    ca_cert: "./security/ccae/security/certs/ca.pem"
    tls_cert: "./security/ccae/security/certs/cert.pem"
    tls_key: "./security/ccae/security/keys/cert.key.pem"
    tls_passwd: "./security/ccae/security/pass/key_pwd.txt"
    tls_crl: "ccae"
  cluster_tls_enable: false
  cluster_tls_items:
    ca_cert: "./security/clusterd/security/certs/ca.pem"
    tls_cert: "./security/clusterd/security/certs/cert.pem"
    tls_key: "./security/clusterd/security/keys/cert.key.pem"
    tls_passwd: "./security/clusterd/security/pass/key_pwd.txt"
    tls_crl: "clusterd"
  etcd_server_tls_enable: false
  etcd_server_tls_items:
    ca_cert: "./security/etcd_server/security/certs/ca.pem"
    tls_cert: "./security/etcd_server/security/certs/cert.pem"
    tls_key: "./security/etcd_server/security/keys/cert.key.pem"
    tls_passwd: "./security/etcd_server/security/pass/key_pwd.txt"
    kmc_ksf_master: "./security/etcd_server/tools/pmt/master/ksfa"
    kmc_ksf_standby: "./security/etcd_server/tools/pmt/standby/ksfb"
    tls_crl: ""

For details about how to configure and use certificates, see "Cluster Service Deployment" > "Prefill-Decode Disaggregation" > "Installation and Deployment" > "Configuring Automatic Certificate Generation" in MindIE Motor Development Guide.