Single-Node Inference

Prerequisites

  • The NPU driver and firmware, CANN package, PyTorch, ATB Models, and MindIE have been installed on the server or in the container.
  • If HTTPS two-way authentication is enabled, prepare the service certificate, server private key, and signature verification certificate in advance.
  • If you start the inference service in containerized mode, ensure that the shared memory is greater than or equal to 1 GB.
  • Server requires Python 3.10.x or Python 3.11.x. Python 3.10.13 is used as an example in this section. If Python 3.10.13 is not the default version, add the following environment variables (use the actual Python path):
    export LD_LIBRARY_PATH=/usr/local/python3.10.13/lib:$LD_LIBRARY_PATH
    export PATH=/usr/local/python3.10.13/bin:$PATH

Procedure

  1. Go to the MindIE installation directory as the installation user.
    cd {MindIE installation directory}/latest
  2. Check whether the directory/file permissions are the same as those shown in the following. If no, run the corresponding commands to modify the permissions.
    chmod 750 mindie-service
    chmod -R 550 mindie-service/bin
    chmod 550 mindie-service/lib
    chmod 440 mindie-service/lib/*
    chmod 550 mindie-service/lib/grpc
    chmod 440 mindie-service/lib/grpc/*
    chmod -R 550 mindie-service/include
    chmod -R 550 mindie-service/scripts
    chmod 750 mindie-service/logs
    chmod 750 mindie-service/conf
    chmod 640 mindie-service/conf/config.json
    chmod 700 mindie-service/security
    chmod -R 700 mindie-service/security/*

    If the file permission does not meet the requirements, Server will fail to be started.

  3. Set parameters as required.

    Before the configuration, pay attention to the following points:

    • If HTTPS communication is disabled (httpsEnabled = false), there might be high network security risks.
    • The default value of maxLinkNum is 1000. The recommended value is 300. Due to model performance restrictions, 1000 concurrent requests are supported only for small models with a short sequence length.
    • If you provide the configuration file of MindIE Server through the environment variable MIES_CONFIG_JSON_PATH, you need to ensure the security of the configuration file.
    • modelWeightPath specifies the path for storing model weights. All files in this directory are provided by yourself. You need to ensure the security of these files. Ensure that the user group and username of the config.json file under the path are the same as those of the current user. In addition, ensure that the link is not a soft link, and the file permission is not higher than 750. If the requirements are not met, Server will fail to be started.
    • tlsCaFile indicates the list of CA certificate files used by the RESTful interface of the service plane. The files are provided by yourself. You need to ensure the security of all the files.
    • tlsCert indicates the service certificate file used by the RESTful interface of the service plane. The file is provided by yourself. You need to ensure the security of the file.
    • tlsPk indicates the service certificate private key file used by the RESTful interface of the service plane. You are advised to use the encrypted private key file. The file is provided by yourself. You need to ensure the security of the file.
    • tlsCrlFiles indicates the CRL files used by the RESTful interface of the service plane. The files are provided by yourself. You need to ensure the security of all the files.
    • managementTlsCaFile indicates the list of CA certificate files used by the RESTful interface of the management plane. The files are provided by yourself. You need to ensure the security of all the files.
    • managementTlsCert indicates the service certificate file used by the RESTful interface of the management plane. The file is provided by yourself. You need to ensure the security of the file.
    • managementTlsPk indicates the private key file of the service certificate used by the RESTful interface of the management plane. You are advised to use the encrypted private key file. The file is provided by yourself. You need to ensure the security of the file.
    • managementTlsCrlFiles indicates the CRL files used by the RESTful interface of the management plane. The files are provided by yourself. You need to ensure the security of all the files.
    • interCommTlsCaFiles indicates the list of CA certificate files used for communication between the prefill and decoding nodes in the prefill-decoding disaggregation scenario. The files are provided by yourself. You need to ensure the security of all the files.
    • interCommTlsCert indicates the service certificate file used for communication between the prefill and decoding nodes in the prefill-decoding disaggregation scenario. The file is provided by yourself. You need to ensure the security of the file.
    • interCommPk indicates the private key file of the service certificate used for communication between the prefill and decoding nodes in the prefill-decoding disaggregation scenario. You are advised to use the encrypted private key file. The file is provided by yourself. You need to ensure the security of the file.
    • interCommTlsCrlFiles indicates the CRL files used for communication between the prefill and decoding nodes in the prefill-decoding disaggregation scenario. The files are provided by yourself. You need to ensure the security of all the files.
    • interNodeTlsCaFiles indicates the CA certificate files used for communication between the master and slave nodes in the multi-node scenario. The files are provided by yourself. You need to ensure the security of all the files.
    • interNodeTlsCert indicates the service certificate file used for communication between the master and slave nodes in the multi-node scenario. The file is provided by yourself, and you need to ensure the security of the file.
    • interNodeTlsPk indicates the private key file of the service certificate used for communication between the master and slave nodes in the multi-node scenario. You are advised to use the encrypted private key file. The file is provided by yourself. You need to ensure the security of the file.
    • interNodeTlsCrlFiles indicates the CRL files used for communication between the master and slave nodes in the multi-node scenario. The files are provided by yourself. You need to ensure the security of all the files.
    1. Go to the conf directory and open the config.json file.
      cd mindie-service/conf
      vim config.json
    2. Press i to enter the insert mode and modify parameters as required. For details, see "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.

      The format of the config.json file is as follows:

      {
          "Version": "2.3.0",
          "ServerConfig" :
          {
              "ipAddress" : "127.0.0.1",
              "managementIpAddress": "127.0.0.2",
              "port" : 1025,
              "managementPort" : 1026,
              "metricsPort" : 1027,
              "allowAllZeroIpListening" : false,
              "maxLinkNum" : 1000,
              "httpsEnabled" : true,
              "fullTextEnabled" : false,
              "tlsCaPath" : "security/ca/",
              "tlsCaFile" : ["ca.pem"],
              "tlsCert" : "security/certs/server.pem",
              "tlsPk" : "security/keys/server.key.pem",
              "tlsPkPwd" : "security/pass/key_pwd.txt",
              "tlsCrlPath" : "security/certs/",
              "tlsCrlFiles" : ["server_crl.pem"],
              "managementTlsCaFile" : ["management_ca.pem"],
              "managementTlsCert" : "security/certs/management/server.pem",
              "managementTlsPk" : "security/keys/management/server.key.pem",
              "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
              "managementTlsCrlPath" : "security/management/certs/",
              "managementTlsCrlFiles" : ["server_crl.pem"],
              "metricsTlsCaFile" : ["metrics_ca.pem"],
              "metricsTlsCert" : "security/certs/metrics/server.pem",
              "metricsTlsPk" : "security/keys/metrics/server.key.pem",
              "metricsTlsPkPwd" : "security/pass/metrics/key_pwd.txt",
              "metricsTlsCrlPath" : "security/metrics/certs/",
              "metricsTlsCrlFiles" : ["server_crl.pem"],
              "kmcKsfMaster" : "tools/pmt/master/ksfa",
              "kmcKsfStandby" : "tools/pmt/standby/ksfb",
              "inferMode" : "standard",
              "interCommTLSEnabled" : true,
              "interCommPort" : 1121,
              "interCommTlsCaPath" : "security/grpc/ca/",
              "interCommTlsCaFiles" : ["ca.pem"],
              "interCommTlsCert" : "security/grpc/certs/server.pem",
              "interCommPk" : "security/grpc/keys/server.key.pem",
              "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
              "interCommTlsCrlPath" : "security/grpc/certs/",
              "interCommTlsCrlFiles" : ["server_crl.pem"],
              "openAiSupport" : "vllm",
              "tokenTimeout" : 600,
              "e2eTimeout" : 600,
              "distDPServerEnabled": false
          },
      
          "BackendConfig": {
              "backendName" : "mindieservice_llm_engine",
              "modelInstanceNumber" : 1,
              "npuDeviceIds" : [[0,1,2,3]],
              "tokenizerProcessNumber" : 8,
              "multiNodesInferEnabled": false,
              "multiNodesInferPort": 1120,
              "interNodeTLSEnabled": true,
              "interNodeTlsCaPath": "security/grpc/ca/",
              "interNodeTlsCaFiles": ["ca.pem"],
              "interNodeTlsCert": "security/grpc/certs/server.pem",
              "interNodeTlsPk": "security/grpc/keys/server.key.pem",
              "interNodeTlsPkPwd": "security/grpc/pass/mindie_server_key_pwd.txt",
              "interNodeTlsCrlPath" : "security/grpc/certs/",
              "interNodeTlsCrlFiles" : ["server_crl.pem"],
              "interNodeKmcKsfMaster": "tools/pmt/master/ksfa",
              "interNodeKmcKsfStandby": "tools/pmt/standby/ksfb",
              "ModelDeployConfig":
              {
                  "maxSeqLen" : 2560,
                  "maxInputTokenLen" : 2048,
                  "truncation" : false,
                  "ModelConfig" : [
                      {
                          "modelInstanceType": "Standard",
                          "modelName" : "llama_65b",
                          "modelWeightPath" : "/data/atb_testdata/weights/llama1-65b-safetensors",
                          "worldSize" : 4,
                          "cpuMemSize" : 5,
                          "npuMemSize" : -1,
                          "backendType": "atb",
                          "trustRemoteCode": false,
                          "async_scheduler_wait_time": 120,
                          "kv_trans_timeout" : 10,
                          "kv_link_timeout" : 1080
                      }
                  ]
              },
       
              "ScheduleConfig":
              {
                  "templateType": "Standard",
                  "templateName" : "Standard_LLM",
                  "cacheBlockSize" : 128,
                  "maxPrefillBatchSize" : 50,
                  "maxPrefillTokens" : 8192,
                  "prefillTimeMsPerReq" : 150,
                  "prefillPolicyType" : 0,
                  "decodeTimeMsPerReq" : 50,
                  "decodePolicyType" : 0,
                  "maxBatchSize" : 200,
                  "maxIterTimes" : 512,
                  "maxPreemptCount" : 0,
                  "supportSelectBatch" : false,
                  "maxQueueDelayMicroseconds" : 5000,
                  "maxFirstTokenWaitTime": 2500
              }
          },
          "LogConfig": {
              "dynamicLogLevel" : "",
              "dynamicLogLevelValidHours" : 2,
              "dynamicLogLevelValidTime" : ""
          }
      }
    3. Press Esc, type :wq!, and press Enter to save the changes and exit.
  4. (Optional) Enable HTTPS authentication (that is, set httpsEnabled to true).
    1. Use the certificate import script to import certificates. Table 1 describes the certificate information.
      • When three-plane isolation is enabled for HTTPS, you are advised not to use the same security certificate for the HTTPS service plane and management plane. Using the same security certificate can cause high network security risks.
      • You are advised not to use the same security certificate for HTTPS and gRPC. Using the same security certificate can cause high network security risks.
      • The permission on the script for importing a certificate varies with the specific certificate type. In the case of a CA certificate, service certificate, or CRL certificate, ensure that the permission is 600. In the case of a private key certificate, ensure that the permission is 400.
      • For details about the certificate import script of MindIE Service, see "Auxiliary Tools" > "MindIE Service Tools" > "CertTools" in MindIE Motor Development Guide.
      • If the certificate import times out, rectify the fault by referring to Starting the haveged Service.
      Table 1 Certificate file list

      Certificate File

      Default Destination Path

      Description

      Root certificate

      {MindIE installation directory}/latest/mindie-service/security/ca/

      Multiple CA certificates are supported.

      This file is mandatory when HTTPS is enabled.

      Service certificate

      {MindIE installation directory}/latest/mindie-service/security/certs/

      This file is mandatory when HTTPS is enabled.

      Private key of the service certificate

      {MindIE installation directory}/latest/mindie-service/security/keys/

      Private key file encryption is supported.

      This file is mandatory when HTTPS is enabled.

      Service CRL

      {MindIE installation directory}/latest/mindie-service/security/certs/

      This file is optional after HTTPS is enabled.

      Encrypted password of the service certificate private key

      {MindIE installation directory}/latest/mindie-service/security/pass/

      Optional.

    2. Run the following commands in {MindIE installation directory}/latest to modify the user permissions on the certificate files:
      chmod 400 mindie-service/security/ca/*
      chmod 400 mindie-service/security/certs/*
      chmod 400 mindie-service/security/keys/*
      chmod 400 mindie-service/security/pass/*
  5. Configure environment variables.
    source /usr/local/Ascend/cann/set_env.sh                                 # CANN
    source /usr/local/Ascend/nnal/atb/set_env.sh                                       # ATB
    source /usr/local/Ascend/atb-models/set_env.sh                                # ATB Models

    In PM installation mode, the path of the environment variable configuration file of ATB Models is the current decompression directory, as shown in Environment Variable Configuration. Change the path as required.

  6. Copy the model weight file (prepared by yourself) to the directory specified by modelWeightPath in 3.b.
    cp -r {path_of_the_model_weight_file} /data/atb_testdata/weights/llama1-65b-safetensors
  7. Go to the {MindIE_installation_directory} /latest directory and load environment variables.
    cd ../../
    source mindie-service/set_env.sh
  8. Start the service. The startup command must be run in the /{MindIE installation directory}/latest/mindie-service directory.

    Before starting the service, you are advised to use the pre-check tool of MindStudio to verify the fields in the configuration file and check the validity of the configuration. For details, see Link.

    • (Recommended) Start the service in background process mode.
      nohup ./bin/mindieservice_daemon > output.log 2>&1 &

      If the following information is printed in the file captured by the standard output stream, the startup is successful:

      1
      Daemon start success!
      
    • Start the service directly.
      ./bin/mindieservice_daemon

      If the following information is displayed, the service is started successfully:

      1
      Daemon start success!
      
    • Ascend-CANN-Toolkit generates the kernel_meta_temp_xxxx directory in the directory where the service is started. This directory stores the CCE file of the operator. Therefore, you need to start the inference service in the directory on which the current user has the write permission (for example, Ascend-mindie-server_{version}_linux-{arch}_{abi} or a temporary directory in Ascend-mindie-server_{version}_linux-{arch}).
    • To switch to another user, run the rm -f /dev/shm/* command to delete the shared files created by the previous user. This prevents inference failure in case the new user does not have the read and write permissions on the shared files created by the previous user.
    • For security, the permission on the bin directory is 550, and the directory does not have the write permission. Therefore, mindieservice_daemon cannot be started in the bin directory.
    • The output.log file captured by the standard output stream supports user-defined files and paths.
    • If an error indicating that the lib*.so dependency is missing is reported during service startup, rectify the fault by referring to Error "libboost_thread.so.1.82.0 Cannot Be Found" Is Displayed When MindIE Motor Is Started.