Configuration Parameters (Service-Specific)
- The Server configuration file is config.json, which is stored in {MindIE installation directory}/latest/mindie-service/conf/config.json.
- When reading the configuration file, the system checks the file size first. If the file size is not in the range (0 MB, 10 MB], the system fails to read the file.
Parameters in the Configuration File
|
Parameter |
Value Type |
Value Range |
Description |
|---|---|---|---|
|
Version |
std::string |
"1.0.0" |
Configuration file version, which is fixed at 1.0.0 and cannot be modified. |
|
ServerConfig |
map |
- |
Server configuration, such as ip:port, network request, and network security. For details, see Parameters in ServerConfig. |
|
BackendConfig |
map |
- |
Model backend configuration, including scheduling and model configurations. For details, see Parameters in BackendConfig. |
|
LogConfig |
map |
- |
Log-level configuration. For details, see Parameters in LogConfig. |
|
EnableDynamicAdjustTimeoutConfig |
bool |
|
Dynamic configuration parameter of the timeout interval. If this parameter is set to true, all inference-related timeout intervals are dynamically set to the maximum value. |
Parameters in ServerConfig
|
Parameter |
Value Type |
Value Range |
Description |
|---|---|---|---|
|
ipAddress |
std::string |
IPv4 or IPv6 address. |
Mandatory. The default value is 127.0.0.1. IP address bound to the service-plane RESTful API and provided by EndPoint.
NOTE:
If all-zero monitoring is enabled, triple-plane isolation becomes invalid, which does not meet security configuration requirements. Therefore, the IP address cannot be set to 0.0.0.0 by default. If the IP address is set to 0.0.0.0, set allowAllZeroIpListening in the configuration file to true to ensure security. |
|
managementIpAddress |
std::string |
IPv4 or IPv6 address. |
Optional. The default value is 127.0.0.2. IP address bound to the internal RESTful API and provided by EndPoint.
NOTE:
If all-zero monitoring is enabled, triple-plane isolation becomes invalid, which does not meet security configuration requirements. Therefore, the IP address cannot be set to 0.0.0.0 by default. If the IP address is set to 0.0.0.0, set allowAllZeroIpListening in the configuration file to true to ensure security. |
|
port |
int32_t |
[1024, 65535] |
Mandatory. The default value is 1025. Port number bound to the service-plane RESTful API provided by EndPoint. If the IP address of a physical machine (PM) or host is used for communication, ensure that the port number does not conflict. |
|
managementPort |
int32_t |
[1024, 65535] |
Optional. The default value is 1026. Port number bound to the internal APIs provided by EndPoint. (For details about the internal APIs, see Table 1.)
There are four solutions for the service plane and internal APIs:
|
|
metricsPort |
int32_t |
[1024, 65535] |
Optional. The default value is 1027. Port number of the service management and control metric API (Prometheus format). The value can be the same as or different from that of managementPort. |
|
allowAllZeroIpListening |
Bool |
|
Mandatory. The default and recommended value is false. If the value is true, all-zero monitoring may pose risks to the user environment, which requires the protection capabilities of the environment. Whether to support all-zero monitoring IP addresses.
|
|
maxLinkNum |
uint32_t |
[1, 4096] |
Mandatory. The default value is 1000. Maximum number of concurrent RESTful requests supported by EndPoint. This parameter indicates that maxLinkNum requests are being processed concurrently and 2 × maxLinkNum requests are waiting in the queue. Therefore, the request at position (3 × maxLinkNum + 1) is rejected. The recommended value is 300. This parameter is affected by model performance. Typically, 1,000 concurrent requests can be used only for a small model with short sequence lengths. |
|
httpsEnabled |
Bool |
|
Mandatory. The default and recommended value is true. You are advised to enable this function. If it is disabled, high network security risks exist. Whether to enable HTTPS communication security authentication.
If this parameter is set to false, subsequent HTTPS communication parameters are ignored. |
|
fullTextEnabled |
Bool |
|
Optional. The default value is false. Whether to enable the streaming API to return all historical results.
|
|
tlsCaPath |
std::string |
The length range of the absolute file path is [1, 4096]. The actual path is the combined result of the project path and tlsCaPath. |
Root certificate path. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled is set to true and is mandatory in this case. The default value is security/ca/. |
|
tlsCaFile |
std::set<std::string> |
The length range of the absolute file path is [1, 4096]. The minimum number of elements in the list is 1 and the maximum number is 3. |
List of root certificate names on the service plane. This parameter takes effect when httpsEnabled is set to true and is mandatory in this case. The default value is ["ca.pem"]. |
|
tlsCert |
std::string |
The length range of the absolute file path is [1, 4096]. The actual path is the combined result of the project path and tlsCert. |
Path of the service certificate file on the service plane. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled is set to true and is mandatory in this case. The default value is security/certs/server.pem. |
|
tlsPk |
std::string |
The length range of the absolute file path is [1, 4096]. The actual path is the combined result of the project path and tlsPk. |
Path of the private key file of the service certificate on the service plane. Only the relative path under the software package installation path is supported. The length of the private key must be greater than or equal to 3072. This parameter takes effect when httpsEnabled is set to true and is mandatory in this case. The default value is security/keys/server.key.pem. |
|
tlsPkPwd |
std::string |
The length range of the absolute file path is [0, 4096]. The actual path is the combined result of the project path and tlsPkPwd. |
Path of the encryption private key file of the service certificate on the service plane. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled is set to true. This parameter is optional. The default value is security/pass/key_pwd.txt. If the private key is encrypted but this file is not provided, the system prompts you to enter the private key encryption password in the interactive window when the system is started. |
|
tlsCrlPath |
std::string |
The path length range of tlsCrlPath+tlsCrlFiles is [0, 4096]. The actual path is the combined result of the project path and tlsCrlPath. |
Path of the service certificate CRL directory on the service plane. Only the relative path under the software package installation path is supported.
|
|
tlsCrlFiles |
std::set<std::string> |
The path length range of tlsCrlPath+tlsCrlFiles is [1, 4096]. The minimum number of elements in the list is 1 and the maximum number is 3. |
CRL name list on the service plane. If httpsEnabled is set to true, this parameter is optional. The default value is ["server_crl.pem"]. |
|
managementTlsCaFile |
std::set<std::string> |
It is recommended that the length range of tlsCaPath+managementTlsCaFile be [0, 4096]. The minimum number of elements in the list is 1 and the maximum number is 3. |
Name list of root certificates for internal APIs. The certificates of the internal APIs and service plane are stored in the same path (tlsCaPath). This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress, and is mandatory in this case. The default value is ["management_ca.pem"]. |
|
managementTlsCert |
std::string |
The length range of the file path is [1, 4096]. The actual path is the combined result of the project path and managementTlsCert. |
Path of the service certificate file for internal APIs. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress, and is mandatory in this case. The default value is security/certs/management/server.pem. |
|
managementTlsPk |
std::string |
The length range of the file path is [1, 4096]. The actual path is the combined result of the project path and managementTlsPk. |
Path of the private key file of the service certificate for internal APIs. Only the relative path under the software package installation path is supported. The length of the private key must be greater than or equal to 3072. This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress, and is mandatory in this case. The default value is security/keys/management/server.key.pem. |
|
managementTlsPkPwd |
std::string |
The length range of the file path is [0, 4096]. The actual path is the combined result of the project path and managementTlsPkPwd. |
Path of the encryption private key file of the service certificate for internal APIs. This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress. This parameter is optional. The default value is security/pass/management/key_pwd.txt. If the private key is encrypted but this file is not provided, the system prompts you to enter the private key encryption password in the interactive window when the system is started. |
|
managementTlsCrlPath |
std::string |
The path length range of managementTlsCrlPath+managementTlsCrlFiles is [1, 4096]. The actual path is the combined result of the project path and managementTlsCrlPath. |
Path of the certificate CRL folder for internal APIs. Only the relative path under the software package installation path is supported.
|
|
managementTlsCrlFiles |
std::set<std::string> |
The path length range of managementTlsCrlPath+managementTlsCrlFiles is [1, 4096]. The minimum number of elements in the list is 1 and the maximum number is 3. |
CRL name list for internal APIs. If httpsEnabled is set to true, this parameter is optional. The default value is ["server_crl.pem"]. |
|
metricsTlsCaFile |
std::set<std::string> |
It is recommended that the length range of tlsCaPath+metricsTlsCaFile be [0, 4096]. The minimum number of elements in the list is 1 and the maximum number is 3. |
Name list of root certificates for internal APIs. The certificates of the internal APIs and service plane are stored in the same path (tlsCaPath). This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress, and is mandatory in this case. The default value is ["metrics_ca.pem"]. |
|
metricsTlsCert |
std::string |
The length range of the file path is [1, 4096]. The actual path is the combined result of the project path and metricsTlsCert. |
Path of the service certificate file for internal APIs. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress, and is mandatory in this case. The default value is security/certs/metrics/server.pem. |
|
metricsTlsPk |
std::string |
The length range of the file path is [1, 4096]. The actual path is the combined result of the project path and metricsTlsPk. |
Path of the private key file of the service certificate for internal APIs. Only the relative path under the software package installation path is supported. The length of the private key must be greater than or equal to 3072. This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress, and is mandatory in this case. The default value is security/keys/metrics/server.key.pem. |
|
metricsTlsPkPwd |
std::string |
The length range of the file path is [0, 4096]. The actual path is the combined result of the project path and metricsTlsPkPwd. |
Path of the encryption private key file of the service certificate for internal APIs. This parameter takes effect when httpsEnabled = true and ipAddress != managementIpAddress. This parameter is optional. The default value is security/pass/metrics/key_pwd.txt. If the private key is encrypted but this file is not provided, the system prompts you to enter the private key encryption password in the interactive window when the system is started. |
|
metricsTlsCrlPath |
std::string |
The path length range of metricsTlsCrlPath+metricsTlsCrlFiles is [1, 4096]. The actual path is the combined result of the project path and managementTlsCrlPath. |
Path of the certificate CRL folder for internal APIs. Only the relative path under the software package installation path is supported.
|
|
metricsTlsCrlFiles |
std::set<std::string> |
The path length range of metricsTlsCrlPath+metricsTlsCrlFiles is [1, 4096]. The minimum number of elements in the list is 1 and the maximum number is 3. |
CRL name list for internal APIs. If httpsEnabled is set to true, this parameter is optional. The default value is ["server_crl.pem"]. |
|
kmcKsfMaster |
std::string |
The length range of the file path is [1, 4096]. The actual path is the combined result of the project path and kmcKsMaster. |
Path of the KMC keystore file. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled is set to true and is mandatory in this case. The default value is tools/pmt/master/ksfa. |
|
kmcKsfStandby |
std::string |
The length range of the file path is [1, 4096]. The actual path is the combined result of the project path and kmcKsStandby1. |
Path of the KMC keystore backup file. Only the relative path under the software package installation path is supported. This parameter takes effect when httpsEnabled is set to true and is mandatory in this case. The default value is tools/pmt/standby/ksfb. |
|
inferMode |
std::string |
|
Mandatory. The default value is standard. Whether prefill-decode disaggregation is enabled.
|
|
interCommTLSEnabled |
Bool |
|
Optional. The default value is true. You need to configure related certificates. Whether to enable TLS for communication between instances in a cluster.
If the value is false or inferMode is standard, ignore the parameters related to internal communication of a cluster. |
|
interCommPort |
uint16_t |
[1024, 65535] |
Optional. The default value is 1121. Communication port between instances in a cluster. |
|
interCommTlsCaPath |
std::string |
The path length of interCommTlsCaPath+interCommTlsCaFiles depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommTlsCaPath. |
Optional. The default value is security/grpc/ca/. If TLS is enabled for communication between instances in a cluster, this parameter is used to specify the path of the CA file. |
|
interCommTlsCaFiles |
std::set<std::string> |
The path length of interCommTlsCaPath+interCommTlsCaFiles depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommTlsCaFiles. |
Optional. The default value is ["ca.pem"]. If TLS is enabled for communication between instances in a cluster, this parameter is used to specify the name of the CA file. |
|
interCommTlsCert |
std::string |
The path length depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommTlsCert. |
Optional. The default value is security/grpc/certs/server.pem. If TLS is enabled for communication between instances in a cluster, the file specified by this parameter is used as the certificate. |
|
interCommPk |
std::string |
The path length depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommPk, which depends on the OS configuration (PATH_MAX for Linux). |
Optional. The default value is security/grpc/keys/server.key.pem. If TLS is enabled for communication between instances in a cluster, the file specified by this parameter is used as the private key. |
|
interCommPkPwd |
std::string |
The path length depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommPkPwd. |
Optional. The default value is security/grpc/pass/key_pwd.txt. If TLS is enabled for communication between instances in a cluster, the file specified by this parameter is used as the private key password. |
|
interCommTlsCrlPath |
std::string |
The path length of interCommTlsCrlPath+interCommTlsCrlFiles depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommTlsCrlPath. |
Optional. The default value is security/grpc/certs/. If TLS is enabled for communication between instances in a cluster, use this parameter to specify the path of the CRL file. |
|
interCommTlsCrlFiles |
std::set<std::string> |
The path length of interCommTlsCrlPath+interCommTlsCrlFiles depends on the OS configuration (PATH_MAX for Linux). The actual path is the combined result of the project path and interCommTlsCrlFiles. |
Optional. The default value is ["server_crl.pem"]. If TLS is enabled for communication between instances in a cluster, use this parameter to specify the name of the CRL file. |
|
openAiSupport |
std::string |
String |
Optional. The default value is vllm. Whether to use OpenAI compatible with vLLM.
This parameter supports hot update. |
|
tokenTimeout |
uint32_t |
[1, 3600] |
Inference timeout interval of each token. The default value is 600, in seconds. In the prefill-decode disaggregation scenario, this parameter must be set to the same value for prefill and decode nodes. |
|
e2eTimeout |
uint32_t |
[1, 65535] |
End-to-end (from request receiving to inference completion) timeout interval. The default value is 600, in seconds. In the prefill-decode disaggregation scenario, this parameter must be set to the same value for prefill and decode nodes. |
|
maxRequestLength |
uint32_t |
[1, 100] |
Optional. The default value is 40, in MB. Maximum number of characters input in the request body. |
|
maxJsonDepth |
uint32_t |
[10, 100] |
Optional. The default value is 10. The maximum nesting depth of JSON in the input request. |
|
distDPServerEnabled |
Bool |
|
Mandatory. The default value is false. Whether to enable distributed deployment. This parameter is takes effect only in MoE EP scenarios. |
- If the network environment is insecure, HTTPS communication will be disabled (httpsEnabled = false) due to high network security risks.
- If the compute node where the inference service is conducted is networked across both the WAN and LAN, the IP address bound to 0.0.0.0 could compromise network isolation, leading to significant security vulnerabilities. Therefore, the EndPoint IP address cannot be bound to 0.0.0.0 in this scenario by default. If you still need to use 0.0.0.0, ensure that the environment has the protection capability for all-zero monitoring and set allowAllZeroIpListening to true to manually allow all-zero monitoring. You need to bear the security risks of enabling all-zero monitoring.
- In addition to the tokenTimeout and e2eTimeout parameters in the configuration file, the time parameters related to inference timeout also include the timeout parameter from the client in some APIs (for example, Token Inference API). If either of the two timeout intervals is reached, a timeout occurs. When a request times out, the Server returns a timeout error to the client and terminates the inference process of the request.
Parameters in BackendConfig
|
Parameter |
Value Type |
Value Range |
Description |
|---|---|---|---|
|
backendName |
std::string |
The value is a string of 1 to 50 characters and can contain only lowercase letters and underscores (_). The value cannot start or end with an underscore (_). |
Mandatory. Only mindieservice_llm_engine is supported. Inference backend name. You can use this parameter to obtain the backend instance. |
|
modelInstanceNumber |
uint32_t |
[1, 10] |
Mandatory. The default value is 1. Number of model instances. In the single-model multi-node inference scenario, the value must be 1. |
|
npuDeviceIds |
std::vector<std::set<size_t>> |
Set this parameter based on the model and environment. |
Mandatory. The default value is [[0,1,2,3]]. Devices to be enabled. npuIds allocated to each model instance is represented by the logical processor ID.
|
|
tokenizerProcessNumber |
uint32_t |
[1, 32] |
Mandatory. The default value is 8. Number of tokenizer processes. When there are many CPU cores, you can increase the value to improve tokenizer performance. |
|
multiNodesInferEnabled |
Bool |
|
Optional. The default value is false.
|
|
multiNodesInferPort |
int32_t |
[1024, 65535] |
Optional. The default value is 1120. Port number for cross-machine communication, which is used in multi-node inference scenarios. |
|
interNodeTLSEnabled |
Bool |
|
Optional. The default value is true. If this parameter is set to false, subsequent parameters can be ignored. Whether to enable certificate security authentication for cross-machine communication during multi-node inference.
|
|
interNodeTlsCaPath |
std::string |
It is recommended that the path length of interNodeTlsCaPath+interNodeTlsCaFiles be less than or equal to 4096. The actual path is the combined result of the project path and interNodeTlsCaPath. The upper limit depends on the operating system. The minimum length is 1. |
Path of root certificate names. Only the relative path under the software package installation path is supported. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is security/grpc/ca/. |
|
interNodeTlsCaFiles |
std::set<std::string> |
It is recommended that the path length of interNodeTlsCaPath+interNodeTlsCaFiles be less than or equal to 4096. The actual path is the combined result of the project path and interNodeTlsCaFiles. The upper limit depends on the operating system. The minimum length is 1. |
Root certificate name list. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is ["ca.pem"]. |
|
interNodeTlsCert |
std::string |
It is recommended that the length of the file path be less than or equal to 4096. The actual path is the combined result of the project path and interNodeTlsCert. The upper limit depends on the operating system. The minimum length is 1. |
Path of the service certificate file. Only the relative path under the software package installation path is supported. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is security/grpc/certs/server.pem. |
|
interNodeTlsPk |
std::string |
It is recommended that the length of the file path be less than or equal to 4096. The actual path is the combined result of the project path and interNodeTlsPk. The upper limit depends on the operating system. The minimum length is 1. |
Path of private key file of the service certificate. Only the relative path under the software package installation path is supported. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is security/grpc/keys/server.key.pem. |
|
interNodeTlsPkPwd |
std::string |
It is recommended that the length of the file path be less than or equal to 4096. This parameter can be left empty. If this parameter is not left empty, the actual path is the combined result of the project path and interNodeTlsPkPwd. The upper limit depends on the operating system. The minimum length is 1. |
Path of the encryption private key file of the service certificate. Only the relative path under the software package installation path is supported. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is security/grpc/pass/mindie_server_key_pwd.txt. |
|
interNodeTlsCrlPath |
std::string |
It is recommended that the path length of interNodeTlsCrlPath+interNodeTlsCrlFiles be less than or equal to 4096. The actual path is the combined result of the project path and interNodeTlsCrlPath. The upper limit depends on the operating system. The minimum length is 1. |
Optional. The default value is security/grpc/certs/. Path of the service certificate CRL directory. This parameter takes effect when interNodeTLSEnabled is set to true. |
|
interNodeTlsCrlFiles |
std::set<std::string> |
It is recommended that the path length of interNodeTlsCrlPath+interNodeTlsCrlFiles be less than or equal to 4096. The actual path is the combined result of the project path and interNodeTlsCrlFiles. The upper limit depends on the operating system. The minimum length is 1. |
Optional. The default value is ["server_crl.pem"]. Service certificate CRL. This parameter takes effect when interNodeTLSEnabled is set to true. |
|
interNodeKmcKsfMaster |
std::string |
It is recommended that the length of the file path be less than or equal to 4096. The actual path is the combined result of the project path and interNodeKmcKsfMaster. The upper limit depends on the operating system. The minimum length is 1. |
Path of the KMC keystore file. Only the relative path under the software package installation path is supported. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is tools/pmt/master/ksfa. |
|
interNodeKmcKsfStandby |
std::string |
It is recommended that the length of the file path be less than or equal to 4096. The actual path is the combined result of the project path and interNodeKmcKsfStandby. The upper limit depends on the operating system. The minimum length is 1. |
Path of the KMC keystore backup file. Only the relative path under the software package installation path is supported. This parameter takes effect when interNodeTLSEnabled is set to true and is mandatory in this case. The default value is tools/pmt/standby/ksfb. |
|
ModelDeployConfig |
map |
- |
Configurations for model deployment. For details, see Parameters in ModelDeployConfig. |
|
ScheduleConfig |
map |
- |
Scheduling configurations. For details, see Parameters in ScheduleConfig. |
Parameters in ModelDeployConfig
|
Parameter |
Value Type |
Value Range |
Description |
|---|---|---|---|
|
maxSeqLen |
uint32_t |
The upper limit is determined by the graphics memory and user requirements. The minimum value must be greater than 0. |
Mandatory. The default value is 2560. Maximum sequence length. Select a proper value for maxSeqLen based on the inference scenario. If maxSeqLen is greater than the maximum sequence length supported by the model, the inference accuracy may be affected. |
|
maxInputTokenLen |
uint32_t |
[1, 4194304] |
Mandatory. The default value is 2048. Maximum length of the input token ID. maxInputTokenLen = min(maxInputTokenLen, maxSeqLen – 1)
|
|
truncation |
Bool |
|
Optional. The default value is false. Whether to truncate parameter rationalization verification.
maxInputTokenLen = min(maxInputTokenLen, maxSeqLen – 1)
|
|
ModelConfig |
map |
- |
Model configurations, including postprocessing parameters. For details, see Parameters in ModelConfig. |
Parameters in ModelConfig
Parameters in ScheduleConfig
Due to the adjustment of the recomputation scheduling policy, the performance of different versions may fluctuate under the same scheduling parameters. For details about how to obtain the optimal performance, see Performance Tuning.
|
Parameter |
Value Type |
Value Range |
Description |
|---|---|---|---|
|
templateType |
std::string |
|
Mandatory. The default value is Standard. Inference type.
This field does not take effect in prefill-decode disaggregation scenarios. |
|
templateName |
std::string |
The value can only be Standard_LLM. |
Mandatory. The default value is Standard_LLM. Name of a scheduling workflow. |
|
cacheBlockSize |
uint32_t |
[1, 128] |
Size of a KV cache block, in tokens. Mandatory. The default and recommended value is 128. For other values, they must be the nth power of 2. |
|
maxPrefillBatchSize |
uint32_t |
[1, maxBatchSize] |
Mandatory. The default value is 50. Maximum prefill batch size. A batch is grouped when either maxPrefillBatchSize or maxPrefillTokens reaches its value. This parameter is used when the batch size in the prefill phase needs to be limited. If the batch size does not need to be limited, you can set this parameter to 0 (the engine uses the value of maxBatchSize by default) or a value the same as that of maxBatchSize. |
|
maxPrefillTokens |
uint32_t |
[1, 4194304]. The value must be greater than or equal to the value of maxInputTokenLen. |
Mandatory. The default value is 8192. During each prefill, the total number of input tokens in the current batch cannot exceed the value of maxPrefillTokens. A batch is grouped when either maxPrefillTokens or maxPrefillBatchSize reaches its value. You are advised not to set this parameter to a large value. If the graphics memory overflows, you can set this parameter to a smaller value. |
|
prefillTimeMsPerReq |
uint32_t |
[0, 1000] |
Mandatory. The default value is 150. This parameter is used together with decodeTimeMsPerReq to determine whether prefill or decode should be selected for the next inference. This parameter is valid only when supportSelectBatch is set to true. The unit is ms. For details about the scheduling policy process, see Figure 1.
|
|
prefillPolicyType |
uint32_t |
0 |
Mandatory. The default value is 0. Scheduling policy in the prefill phase. For details about the scheduling policy process, see Figure 2. 0: FCFS (first-come first-served) |
|
decodeTimeMsPerReq |
uint32_t |
[0, 1000] |
Mandatory. The default value is 50. This parameter is used together with prefillTimeMsPerReq to determine whether prefill or decode should be selected for the next inference. This parameter is valid only when supportSelectBatch is set to true. The unit is ms. For details about the scheduling policy process, see Figure 1.
|
|
decodePolicyType |
uint32_t |
0 |
Mandatory. The default value is 0. Scheduling policy in the decode phase. For details about the scheduling policy process, see Figure 2. 0: FCFS (first-come first-served) |
|
maxBatchSize |
uint32_t |
[1, 5000]. The value must be greater than or equal to the value of maxPreemptCount.
NOTE:
For the Atlas 300I Duo inference card, the value range is [1, 2000]. |
Mandatory. The default value is 200. Maximum decode batch size.
|
|
maxIterTimes |
uint32 |
[1, maxSeqLen] |
Mandatory. The default value is 512. Global maximum output length of the model.
|
|
maxPreemptCount |
uint32_t |
[0, maxBatchSize]. If the value is greater than 0, the value of cpuMemSize cannot be 0. |
Mandatory. The default value is 0. Maximum number of requests that can be preempted in a batch, that is, the maximum number of requests that can be preempted in a round of scheduling. The maximum value is maxBatchSize. If the value is greater than 0, preemption is enabled. |
|
supportSelectBatch |
Bool |
|
Mandatory. The default value is false. Batch selection policy. This field does not take effect in prefill-decode disaggregation scenarios.
|
|
maxQueueDelayMicroseconds |
uint32_t |
[500, 1000000] |
Mandatory. The default value is 5000. Maximum waiting time of a request in the queue before the number of requests in the queue reaches the maximum value of maxBatchSize, maxPrefillBatchSize, or maxPrefillTokens. The unit is μs. As long as the waiting time reaches the value of this parameter, the next inference is performed even if the number of requests does not reach the maximum value of maxBatchSize, maxPrefillBatchSize, or maxPrefillTokens. |
|
maxFirstTokenWaitTime |
uint32_t |
[0, 3600000] |
Optional. The default value is 2500 ms. Maximum queuing time after a request arrives. After the waiting time reaches the value of this parameter, the current round of scheduling are allowed to preempt requests to reduce the Time to First Token (TTFT). This field does not take effect in prefill-decode disaggregation scenarios. |

