Prefill-Decode Disaggregation Parameters

Currently, only typical parameters can be configured. Other parameters are automatically set based on the value of max_seq_len (corresponding to maxSeqLen in the following table) and the number of D instances.

Typical Parameters for the Atlas 800I A2 Inference Server

**Table 1** Parameters
Parameter Type	Parameter	Context Sequence Length
Parameter Type	Parameter	16K	64K	128K
P instance parameters (mindie_server_prefill_config)	maxSeqLen	18000	68000	134000
	maxInputTokenLen	18000	68000	134000
	dp	2	1	1
	cp	1	2	2
	tp	8	8	8
	sp	1	8	8
	pp	1	1	1
	moe_ep	4	16	16
	moe_tp	4	1	1
	ep_level	1	1	1
	MTP	On	On	Off
	enable_init_routing_cutoff	false	true	true
	topk_scaling_factor	Ineffective	0.25	0.25
	maxPrefillTokens	18000	68000	134000
D instance parameters (mindie_server_decode_config)	maxSeqLen	18000	68000	134000
	maxInputTokenLen	18000	68000	134000
	dp	4-node D instance: 32 8-node D instance: 64	4-node D instance: 32 8-node D instance: 64	4-node D instance: 32 8-node D instance: 64
	tp	1	1	1
	sp	1	1	1
	cp	1	1	1
	pp	1	1	1
	moe_ep	4-node D instance: 32 8-node D instance: 64	4-node D instance: 32 8-node D instance: 64	4-node D instance: 32 8-node D instance: 64
	moe_tp	1	1	1
	ep_level	2	2	2
	MTP	On	On	Off
	maxPrefillTokens	18000	68000	134000
	maxIterTimes	18000	68000	134000

**Table 2** Typical configurations
Node Configuration	P/D Disaggregation Configuration	Switch Selection Reference
8 + 2 + 1	2 x 2P + 1 x 4D + 2 (two-node cluster) + 1 (online hot backup)	Switch specifications: 32 x 400GE, for example, XH9210: three leaf switches and two spine switches
16 + 1	4 x 2P + 2 x 4D + 1 (online hot backup)	Switch specifications: 32 x 400GE, for example, XH9210: five leaf switches and four spine switches
N x 16	N x (4 x 2P + 1 x 8D) Linear expansion based on the optimal 16-node performance (EP64)	Switch specifications: 32 x 400GE, for example, XH9210: 32 leaf switches and 16 spine switches (taking N = 8 and a total of 1024 NPUs as an example)

The Atlas 800I A2 inference server used in the MoE EP solution can only be the Atlas 800I A2 inference server (64 GB HCCS), the NPU on-chip memory must be 64 GB, and the optical module of the NPU network port must be 200 GE.

Typical Parameters for the Atlas 800I A3 SuperPoD Server

**Table 3** Parameters
Parameter Type	Parameter	Context Sequence Length
Parameter Type	Parameter	16K	64K	128K
P instance parameters (mindie_server_prefill_config)	maxSeqLen	18000	68000	134000
	maxInputTokenLen	18000	68000	134000
	dp	2	1	1
	cp	1	2	2
	tp	8	8	8
	sp	1	8	8
	pp	1	1	1
	moe_ep	16	16	16
	moe_tp	1	1	1
	ep_level	2	2	2
	MTP	On	On	Off
	maxPrefillTokens	18000	68000	134000
D instance parameters (mindie_server_decode_config)	maxSeqLen	18000	68000	134000
	maxInputTokenLen	18000	68000	134000
	dp	4-node D instance: 64 8-node D instance: 128	4-node D instance: 64 8-node D instance: 128	4-node D instance: 64 8-node D instance: 128
	tp	1	1	1
	sp	1	1	1
	cp	1	1	1
	pp	1	1	1
	moe_ep	4-node D instance: 64 8-node D instance: 128	4-node D instance: 64 8-node D instance: 128	4-node D instance: 64 8-node D instance: 128
	moe_tp	1	1	1
	ep_level	2	2	2
	MTP	On	On	Off
	maxPrefillTokens	18000	68000	134000
	maxIterTimes	18000	68000	134000

**Table 4** Typical configurations
Node Configuration	P/D Disaggregation Configuration	Number of Bus Network Switches (L2)
8 + 1 (optional)	4 x 1P + 2 x 2D + 1 (A3 redundant node, optional)	14
16 + 1 (optional)	8 x 1P + 2 x 4D + 1 (A3 redundant node, optional)	28
32 + 1 (optional)	16 x 1P + 4 x 4D + 1 (A3 redundant node, optional)	56
48	241P+64D	56
N x 48	N x (24 x 1P + 6 x 4D)	N x 56

(Optional) Certificate Configuration

The certificate parameter configuration file is stored in $HOME/ascend-deployer/ascend_deployer/group_vars/master/tls_config.yaml. The following is an example of the file content:

# group_vars/tls_config.yaml
tls_config:
  tls_enable: false
  kmc_ksf_master: "./security/master/tools/pmt/master/ksfa"
  kmc_ksf_standby: "./security/standby/tools/pmt/standby/ksfb"
  infer_tls_items:
    ca_cert: "./security/infer/security/certs/ca.pem"
    tls_cert: "./security/infer/security/certs/cert.pem"
    tls_key: "./security/infer/security/keys/cert.key.pem"
    tls_passwd: "./security/infer/security/pass/key_pwd.txt"
    tls_crl: "infer"
  management_tls_items:
    ca_cert: "./security/management/security/certs/ca.pem"
    tls_cert: "./security/management/security/certs/cert.pem"
    tls_key: "./security/management/security/keys/cert.key.pem"
    tls_passwd: "./security/management/security/pass/key_pwd.txt"
    tls_crl: "management"

  # ccae_tls_enable and ccae_tls_items do not need to be set in the Atlas 800I A2 inference server scenario.
  ccae_tls_enable: false
  ccae_tls_items:
    ca_cert: "./security/ccae/security/certs/ca.pem"
    tls_cert: "./security/ccae/security/certs/cert.pem"
    tls_key: "./security/ccae/security/keys/cert.key.pem"
    tls_passwd: "./security/ccae/security/pass/key_pwd.txt"
    tls_crl: "ccae"
  cluster_tls_enable: false
  cluster_tls_items:
    ca_cert: "./security/clusterd/security/certs/ca.pem"
    tls_cert: "./security/clusterd/security/certs/cert.pem"
    tls_key: "./security/clusterd/security/keys/cert.key.pem"
    tls_passwd: "./security/clusterd/security/pass/key_pwd.txt"
    tls_crl: "clusterd"
  etcd_server_tls_enable: false
  etcd_server_tls_items:
    ca_cert: "./security/etcd_server/security/certs/ca.pem"
    tls_cert: "./security/etcd_server/security/certs/cert.pem"
    tls_key: "./security/etcd_server/security/keys/cert.key.pem"
    tls_passwd: "./security/etcd_server/security/pass/key_pwd.txt"
    kmc_ksf_master: "./security/etcd_server/tools/pmt/master/ksfa"
    kmc_ksf_standby: "./security/etcd_server/tools/pmt/standby/ksfb"
    tls_crl: ""

For details about how to configure and use certificates, see "Cluster Service Deployment" > "Prefill-Decode Disaggregation" > "Installation and Deployment" > "Configuring Automatic Certificate Generation" in MindIE Motor Development Guide.

Parent topic: Installation and Upgrade Reference