Multi-Node Inference
If the weight of a single model is too large and the NPU memory of a single inference server is limited, the weight parameters of the entire model cannot be accommodated. In this case, multi-node inference is required.
Prerequisites
- Server requires Python 3.10.x or Python 3.11.x. Python 3.10.13 is used as an example in this section. If Python 3.10.13 is not the default version, add the following environment variables (use the actual Python path):
export LD_LIBRARY_PATH=/usr/local/python3.10.13/lib:$LD_LIBRARY_PATH export PATH=/usr/local/python3.10.13/bin:$PATH
- The NPU driver and firmware, CANN package, PyTorch, ATB Models, and MindIE have been installed on the server or in the container.
- If you start the inference service in containerized mode, ensure that the shared memory is greater than or equal to 1 GB.
Constraints
- The Atlas 800I A2 inference server and Atlas 800I A3 SuperPoD Server environments are supported. A maximum of four nodes and 32 cards are supported for multi-node inference. For details about the models supported by multi-node inference, see Model List.
- The default value of maxLinkNum is 1000. The recommended value is 300. The 1000 concurrent requests are affected by model performance. Typically, 1000 concurrent requests can be used for a small model with a short sequence length.
- The default sampling parameters for the weights of different nodes must be consistent. If the sampling parameters are not configured, the inference service may stop responding.
Related Environment Variables
Name |
Description |
|---|---|
MIES_CONTAINER_IP |
For containerized deployment, set this parameter to the IP address of the container. If the container shares an IP address with a bare metal server, set this parameter to the IP address of the bare metal server. This IP address is used for Google Remote Procedure Call (gRPC) between multiple nodes and for request receiving on the EndPoint's service plane. This parameter is not required for bare metal deployment. |
HOST_IP |
For bare metal deployment (not recommended), set this parameter to the IP address of the PM or VM. This parameter is not set during containerized deployment. |
RANK_TABLE_FILE |
Absolute path of the ranktable.json file.
|
MIES_CONFIG_JSON_PATH |
Path of the config.json file.
|
HCCL_DETERMINISTIC |
Deterministic computation for HCCL communication. For multi-node inference, you are advised to set this parameter to true. |
When Server is started, it determines whether single-node inference or multi-node inference is enabled through multiNodesInferEnabled.
- multiNodesInferEnabled = false: single-node inference. Server does not read the RANK_TABLE_FILE environment variable during startup. However, when the underlying model acceleration library is initialized, it attempts to read this environment variable. Therefore, in the single-node inference scenario, if this environment variable is set, ensure that the file content is correct (that is, server_count=1; node IP address, device_ip, and rank_id must be correct).
- multiNodesInferEnabled = true: multi-node inference. During startup, Server reads the RANK_TABLE_FILE environment variable and checks whether the ranktable file content is valid.
- The node whose rank ID is 0 is the master node, and the other nodes are slave nodes.
- The master service instance can receive inference requests from users, while the slave service instance cannot.
When multi-node inference is enabled, npuDeviceIds and worldSize in the config.json file become invalid. The card IDs in use and the total number of ranks are determined based on the ranktable file.
Example of the Ranktable File
The permission on the ranktable.json file must be set to 640. For details, see the following example. (This file needs to be compiled by users.)
{
"version": "1.0",
"server_count": "2",
"server_list": [
{
"server_id": "IP address of the master node",
"container_ip": "IP address of container corresponding to the master node",
"device": [
{ "device_id": "0", "device_ip": "10.20.0.2", "rank_id": "0" },
{ "device_id": "1", "device_ip": "10.20.0.3", "rank_id": "1" },
{ "device_id": "2", "device_ip": "10.20.0.4", "rank_id": "2" },
{ "device_id": "3", "device_ip": "10.20.0.5", "rank_id": "3" },
{ "device_id": "4", "device_ip": "10.20.0.6", "rank_id": "4" },
{ "device_id": "5", "device_ip": "10.20.0.7", "rank_id": "5" },
{ "device_id": "6", "device_ip": "10.20.0.8", "rank_id": "6" },
{ "device_id": "7", "device_ip": "10.20.0.9", "rank_id": "7" }
]
},
{
"server_id": "IP address of the slave node",
"container_ip": "IP address of container corresponding to the slave node",
"device": [
{ "device_id": "0", "device_ip": "10.20.0.10", "rank_id": "8" },
{ "device_id": "1", "device_ip": "10.20.0.11", "rank_id": "9" },
{ "device_id": "2", "device_ip": "10.20.0.12", "rank_id": "10" },
{ "device_id": "3", "device_ip": "10.20.0.13", "rank_id": "11" },
{ "device_id": "4", "device_ip": "10.20.0.14", "rank_id": "12" },
{ "device_id": "5", "device_ip": "10.20.0.15", "rank_id": "13" },
{ "device_id": "6", "device_ip": "10.20.0.16", "rank_id": "14" },
{ "device_id": "7", "device_ip": "10.20.0.17", "rank_id": "15" }
]
}
],
"status": "completed"
}
Parameter description:
- IP address of the master/slave node: Change it based on the actual situation.
- Container IP address of the master/slave node: Generally, the IP address is the same as that of the master/slave node. If --net=host is used upon container startup, the IP address must be the same as the IP address of the host. Change the IP address as required.
- device_id: sequence number of the NPU on the actual node.
- device_ip: IP address of the NPU, which can be configured using hccn_tool.
- rank_id: rank ID of the inference process.
The ranktable.json file is configured using the environment variable RANK_TABLE_FILE. If you provide the file, you need to ensure the security of the file and create the file on both the master and slave nodes.
Procedure
Perform the following operations on both the master and slave nodes.
- Create and start a Docker container. The following uses the 8-card Ascend environment as an example.The following startup commands are for reference only. You can modify commands as required.
docker run -it -d --net=host --shm-size=1g \ --name container_name \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ --device=/dev/davinci3 \ --device=/dev/davinci4 \ --device=/dev/davinci5 \ --device=/dev/davinci6 \ --device=/dev/davinci7 \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \ -v /usr/local/sbin:/usr/local/sbin:ro \ -v /path-to-weights:/path-to-weights:ro \ mindie:2.3.0-800I-A2-aarch64
- container_name: name of the newly started container. Change it based on the actual situation.
- --net=host: During the test and verification, set the container network to the host network to ensure that Docker containers can communicate with each other.
- --shm-size=1g: size of the shared memory (/dev/shm) of a specified container. You can set the size as required. 1g is an example value.
The value cannot exceed the size of the remaining physical memory of the host. You can run the free -h command to view the size of the remaining physical memory. When data parallelism (DP) is enabled, the shared memory size (shm-size) must be adjusted proportionally as the DP value grows beyond 1.
- For a DP value of 2, set shm-size to at least 2 GB.
- For a DP value of 4, set shm-size to at least 3 GB.
- For a DP value of 8, set shm-size to at least 5 GB.
- For a DP value of 16, set shm-size to at least 9 GB.
- --device: used to map devices (such as hardware or files) on the host to the Docker container.
- /usr/local/Ascend: installation directory of the Ascend firmware and driver. Change it based on the actual situation.
- path-to-weights: weight mount directory. Change it based on the actual situation.
- mindie:2.3.0-800I-A2-aarch64: image name of the newly started container. Change it based on the actual situation.
- Go to the MindIE installation directory as the installation user.
cd {MindIE installation directory}/latest - Check whether the directory/file permissions are the same as those shown in the following. If no, run the corresponding commands to modify the permissions.
chmod 750 mindie-service chmod -R 550 mindie-service/bin chmod 550 mindie-service/lib chmod 440 mindie-service/lib/* chmod 550 mindie-service/lib/grpc chmod 440 mindie-service/lib/grpc/* chmod -R 550 mindie-service/include chmod -R 550 mindie-service/scripts chmod 750 mindie-service/logs chmod 750 mindie-service/conf chmod 640 mindie-service/conf/config.json chmod 700 mindie-service/security chmod -R 700 mindie-service/security/*
If the file permission does not meet the requirements, Server will fail to be started.
- Set parameters in the container as required.
Before the configuration, see the precautions in 3.
- Go to the conf directory and open the config.json file.
cd ../conf vim config.json
- Press i to enter the insert mode, set multiNodesInferEnabled to true to enable multi-node inference, and modify parameters in Table 1 as required. For details about the parameters, see "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.
Table 1 Multi-node inference configuration Configuration Option
Description
multiNodesInferPort
Port number for cross-node communication.
interNodeTLSEnabled
Whether to enable certificate security authentication for cross-node communication.
- true: enabled
- false: disabled In this case, ignore the following parameters.
interNodeTlsCaPath
Path of the root certificate name. This option takes effect when interNodeTLSEnabled is set to true.
interNodeTlsCaFiles
Root certificate name list. This option takes effect when interNodeTLSEnabled is set to true.
interNodeTlsCert
Path of the service certificate. This option takes effect when interNodeTLSEnabled is set to true.
interNodeTlsPk
Path of the private key file of the service certificate. This option takes effect when interNodeTLSEnabled is set to true.
interNodeTlsPkPwd
Path of the encryption private key file of the service certificate. This option takes effect when interNodeTLSEnabled is set to true.
interNodeTlsCrlPath
Path of the service certificate revocation list. This option takes effect when interNodeTLSEnabled is set to true.
interNodeTlsCrlFiles
Name of the service certificate revocation list. This option takes effect when interNodeTLSEnabled is set to true.
interNodeKmcKsfMaster
Path of the KMC keystore file. This option takes effect when interNodeTLSEnabled is set to true.
interNodeKmcKsfStandby
Path of the backup KMC keystore file. This option takes effect when interNodeTLSEnabled is set to true.
- If HTTPS communication is disabled (httpsEnabled = false), there might be high network security risks.
- Ensure that the user group and username of the config.json file under modelWeightPath are the same as those of the current user. In addition, ensure that the link is not a soft link, and the file permission is not higher than 640. If the requirements are not met, startup will fail.
- In a data center, if cross-node communication security authentication does not need to be enabled, set interNodeTLSEnabled to false. If it is disabled (that is, interNodeTLSEnabled = false), high network security risks exist.
- Press Esc, type :wq!, and press Enter to save the changes and exit.
- Go to the conf directory and open the config.json file.
- (Optional) Enable gRPC two-way authentication (that is, set interNodeTLSEnabled to true).
- Use the certificate management tool to import certificates. For details, see "Auxiliary Tools" > "MindIE Service Tools" > "CertTools" in MindIE Motor Development Guide.
- When three-plane isolation is enabled for HTTPS, you are advised not to use the same security certificate for the HTTPS service plane and management plane. Using the same security certificate can cause high network security risks.
- You are advised not to use the same security certificate for HTTPS and gRPC. Using the same security certificate can cause high network security risks.
- When importing certificates, ensure that the permissions required by the CA certificate tool, service certificate tool, private key certificate tool, and CRL tool is 600, 600, 400, and 600, respectively.
- Table 2 lists the certificate file information.
- If the certificate import times out, rectify the fault by referring to Starting the haveged Service.
Table 2 Certificate file information Certificate File
Default Destination Path
Description
Root certificate
mindie-service/security/grpc/ca/
Required when interNodeTLSEnabled is set to true.
Service certificate
mindie-service/grpc/certs/
Required when interNodeTLSEnabled is set to true.
Private key of the service certificate
mindie-service/security/grpc/keys/
Private key file encryption is supported.
Required when interNodeTLSEnabled is set to true.
Service CRL
mindie-service/security/grpc/certs/
Mandatory.
Encrypted password of the service certificate private key
mindie-service/security/pass/
Mandatory.
- Run the following commands in {MindIE installation directory}/latest to modify the user permissions on the certificate files:
chmod 400 mindie-service/security/grpc/ca/* chmod 400 mindie-service/security/grpc/certs/* chmod 400 mindie-service/security/grpc/keys/* chmod 400 mindie-service/security/grpc/pass/*
- Use the certificate management tool to import certificates. For details, see "Auxiliary Tools" > "MindIE Service Tools" > "CertTools" in MindIE Motor Development Guide.
- (Optional) Enable HTTPS authentication (that is, set httpsEnabled to true).
- Use the certificate import script to import certificates. Table 3 describes the certificate information.
- When three-plane isolation is enabled for HTTPS, you are advised not to use the same security certificate for the HTTPS service plane and management plane. Using the same security certificate can cause high network security risks.
- You are advised not to use the same security certificate for HTTPS and gRPC. Using the same security certificate can cause high network security risks.
- The permission on the script for importing a certificate varies with the specific certificate type. In the case of a CA certificate, service certificate, or CRL certificate, ensure that the permission is 600. In the case of a private key certificate, ensure that the permission is 400.
- For details about the certificate import script of MindIE Service, see "Auxiliary Tools" > "MindIE Service Tools" > "CertTools" in MindIE Motor Development Guide.
- If the certificate import times out, rectify the fault by referring to Starting the haveged Service.
Table 3 Certificate file list Certificate File
Default Destination Path
Description
Root certificate
{MindIE installation directory}/latest/mindie-service/security/ca/
Multiple CA certificates are supported.
This file is mandatory when HTTPS is enabled.
Service certificate
{MindIE installation directory}/latest/mindie-service/security/certs/
This file is mandatory when HTTPS is enabled.
Private key of the service certificate
{MindIE installation directory}/latest/mindie-service/security/keys/
Private key file encryption is supported.
This file is mandatory when HTTPS is enabled.
Service CRL
{MindIE installation directory}/latest/mindie-service/security/certs/
This file is optional after HTTPS is enabled.
Encrypted password of the service certificate private key
{MindIE installation directory}/latest/mindie-service/security/pass/
Optional.
- Run the following commands in {MindIE installation directory}/latest to modify the user permissions on the certificate files:
chmod 400 mindie-service/security/ca/* chmod 400 mindie-service/security/certs/* chmod 400 mindie-service/security/keys/* chmod 400 mindie-service/security/pass/*
- Use the certificate import script to import certificates. Table 3 describes the certificate information.
- Configure environment variables.
source /usr/local/Ascend/cann/set_env.sh # CANN source /usr/local/Ascend/nnal/atb/set_env.sh # ATB source /usr/local/Ascend/atb-models/set_env.sh # ATB Models
In PM installation mode, the path of the environment variable configuration file of ATB Models is the current decompression directory, as shown in Environment Variable Configuration. Change the path as required.
- Copy the model weight file (prepared by yourself) to the directory specified by modelWeightPath in config.json.
cp -r {path_of_the_model_weight_file} /data/atb_testdata/weights/llama1-65b-safetensors - Load environment variables.
source mindie-service/set_env.sh
- Configure the environment variables RANK_TABLE_FILE and MIES_CONTAINER_IP (ranktable in Example of the Ranktable File is used as an example).
- Container corresponding to the master node
export MIES_CONTAINER_IP=IP address of the master node export RANK_TABLE_FILE=${path}/ranktable.json export HCCL_DETERMINISTIC=true - Container corresponding to the slave node
export MIES_CONTAINER_IP=IP address of the slave node export RANK_TABLE_FILE=${path}/ranktable.json export HCCL_DETERMINISTIC=true
- Container corresponding to the master node
- Start the service by running the startup command in the /{MindIE installation directory}/latest/mindie-service directory. This operation must be performed in containers on both the master and slave nodes.
- (Recommended) Start the service in background process mode.
nohup ./bin/mindieservice_daemon > output.log 2>&1 &
If the following information is printed in the file captured by the standard output stream, the startup is successful:
1Daemon start success!
- Start the service directly.
./bin/mindieservice_daemon
If the following information is displayed, the service is started successfully:
1Daemon start success!
- According to security requirements, the permission on the bin directory is 550, and the directory does not have the write permission. However, during inference, the operator generates the kernel_meta folder in the current directory. This operation requires the write permission. Therefore, mindieservice_daemon and llm_engine_test cannot be directly started in the bin directory.
- The output.log file captured by the standard output stream supports user-defined files and paths.
- If an error indicating that the lib*.so dependency is missing is reported during service startup, rectify the fault by referring to Error "libboost_thread.so.1.82.0 Cannot Be Found" Is Displayed When MindIE Motor Is Started.
- (Recommended) Start the service in background process mode.