Inference Service Fault Recovery

When appliances are deployed in hybrid mode or Kubernetes is not deployed, abnormal inference processes cannot be effectively recovered. This section provides an example of automatic recovery of an inference service fault. In this example, the startup script serves as the container entrypoint to automatically launch the inference process, monitor its status, and restart it if an exception occurs.

  • Single-node inference of MindIE Server is supported.
  • Multi-node inference of MindIE Server is not supported. If only the inference process in one container is restarted, the service cannot be recovered.

Procedure

The following uses the Qwen3-1.7B model as an example.

  1. Obtain the MindIE container image.
  2. View the MindIE image on the node.
    docker images |grep mindie

    Command output:

    ...
    swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie   2.1.RC2-800I-A2-py311-openeuler24.03-lts   a4708118cd12        6 weeks ago         16GB
    ...
  3. Obtain the Qwen3-1.7B model weight file.
    # Create a directory for storing the model weight file.
    mkdir -p /data/atlas_dls/public/infer/model_weight
    cd /data/atlas_dls/public/infer/model_weight/
    # Install git-lfs if it is not installed to manage large files and binary files.
    yum install -y git-lfs 
    # Enable git-lfs.
    git lfs install
    # Download the weight file.
    git clone https://www.modelscope.cn/Qwen/Qwen3-1.7B.git
    # Modify the permission on the weight file.
    chmod -R 750 Qwen3-1.7B/
    # (Optional) If a common user image is used, the weight path must belong to the default user 1000 in the image.
    chown -R 1000:1000 Qwen3-1.7B/

    Weight quantization is required by certain models. For details, see the model's README.md file in ModelZoo-PyTorch.

  4. Copy the config.json file from the MindIE container to the node directory.
    1. Create a directory on the node.
      mkdir -p /data/atlas_dls/public/infer/script/Qwen3-1.7B
    2. Start the container and mount the /data/atlas_dls/public/infer/script/Qwen3-1.7B directory to the container.
      docker run --rm -it \
      -v /data/atlas_dls/public/infer/script/Qwen3-1.7B:/data/atlas_dls/public/infer/script/Qwen3-1.7B \
      <mindie image:tag>  /bin/bash

      Replace <mindie image:tag> with the actual image name and tag.

    3. In the container, copy config.json to /data/atlas_dls/public/infer/script/Qwen3-1.7B.
      cp  $MIES_INSTALL_PATH/conf/config.json /data/atlas_dls/public/infer/script/Qwen3-1.7B/

      The environment variable MIES_INSTALL_PATH in the container specifies the MindIE Server installation path, which is /usr/local/Ascend/mindie/latest/mindie-service by default. Replace it with the actual installation path.

    4. Exit the container.
      exit
    5. View config.json in the /data/atlas_dls/public/infer/script/Qwen3-1.7B directory on the node.
      ll

      Command output:

      ...
      -rw-r----- 1 root root 3920 Nov.  8 11:53 config.json
      ...
  5. Modify config.json.
    1. Open config.json.
      vi /data/atlas_dls/public/infer/script/Qwen3-1.7B/config.json
    2. Press i to enter the insert mode and modify the following parameters as required. For details about the parameters, see "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.
      {
          …
          "ServerConfig" :
      {
              "ipAddress" : "127.0.0.1",
              "managementIpAddress" : "127.0.0.2",
              "port" : 1025,
              "managementPort" : 1026,
              "metricsPort" : 1027,
              …
              "httpsEnabled" : false,
              …
          },
       
      "BackendConfig" : {
          …
              "npuDeviceIds" : [[0,1]],
              …
              "ModelDeployConfig" :
              {
                  …
                  "truncation" : false,
                  "ModelConfig" : [
                      {
                          …
                          "modelName" : "qwen3",
                          "modelWeightPath" : "/job/model_weight/",
                          "worldSize" : 2,
                          …
                      }
                  ]
              },
              …
          }
      }

      modelWeightPath indicates the model weight path mounted to the container.

      httpsEnabled indicates whether to enable the HTTPS protocol. If it is set to true, the HTTPS protocol is enabled. In this case, you need to configure a two-way authentication certificate. If it is set to false, the HTTPS protocol is disabled. You are advised to enable the HTTPS protocol and configure certificate files such as the service certificate and private key required for enabling HTTPS communication by referring to "Auxiliary Tools" > "MindIE Service Tools" > "CertTools" in MindIE Motor Development Guide.

    3. Press Esc, type :wq!, and press Enter to save the changes and exit.
  6. Go to the mindcluster-deploy repository, select a version branch based on mindcluster-deploy Version Description, obtain the startup script infer_start.sh in the samples/inference/without-k8s/ directory, save it to the /data/atlas_dls/public/infer/script/Qwen3-1.7B/ directory on the node, and edit the script.
    1. Open infer_start.sh.
      vi /data/atlas_dls/public/infer/script/Qwen3-1.7B/infer_start.sh
    2. Press i to enter the insert mode and modify the configuration in the script as required.
      ...
      if [[ -z "${MIES_INSTALL_PATH}" ]]; then
          export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service # Installation directory of MindIE Server in the image. If the installation directory is different, change it as required.
      fi
      ...
      mkdir -p /job/script/alllog/
      INFER_LOG_PATH=/job/script/alllog/output_$(date +%Y%m%d_%H%M%S).log # Log flushing path
       
      # config.json
      export MIES_CONFIG_JSON_PATH=/job/script/config.json # Path of the configuration file for starting the inference job, which is mounted to the container during container startup.
      # (Optional) Other user-defined steps
      ...
    3. Press Esc, type :wq!, and press Enter to save the changes and exit.
    4. Add the execution permission to the script.
      chmod +x infer_start.sh

    Directory structure of /data/atlas_dls/public/infer/:

    ├── model_weight
    │   └── Qwen3-1.7B
    └── script
        └── Qwen3-1.7B
            ├── config.json
            └── infer_start.sh
  7. Start the container and start the MindIE task.
    • Use Ascend Docker Runtime to mount processors and devices.
      docker run -it -d --net=host --shm-size=1g \ 
      --name <container-name> \
      -e ASCEND_VISIBLE_DEVICES=0,1 \
      -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
      -v /usr/local/sbin:/usr/local/sbin:ro \
      -v /data/atlas_dls/public/infer/script/Qwen3-1.7B/:/job/script/ \
      -v /data/atlas_dls/public/infer/model_weight/Qwen3-1.7B/:/job/model_weight/ \
      --entrypoint /job/script/infer_start.sh  <mindie image:tag>  <restart_times>
      • <container-name> indicates the container name.
      • Replace <mindie image:tag> with the actual image name and tag.
      • <restart_times> is passed to infer_start.sh, indicating the number of service restarts. Its value is an integer. If this parameter is left blank, the default value 0 is used. If the number of restart times exceeds the upper limit, the container exits.
      • You can change the value of the environment variable ASCEND_VISIBLE_DEVICES as required to mount different numbers of processors. The processor ID must be the same as that contained in the npuDeviceIds field in config.json.
    • Not use Ascend Docker Runtime to mount processors and devices.
      docker run -it -d --net=host --shm-size=1g \
      --name <container-name> \
      --device=/dev/davinci0:rwm \
      --device=/dev/davinci1:rwm \
      --device=/dev/davinci_manager:rwm \
      --device=/dev/devmm_svm:rwm \
      --device=/dev/hisi_hdc:rwm \
       -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
      -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
      -v /usr/local/sbin:/usr/local/sbin:ro \
      -v /data/atlas_dls/public/infer/script/Qwen3-1.7B/:/job/script/ \
      -v /data/atlas_dls/public/infer/model_weight/Qwen3-1.7B/:/job/model_weight/ \
      --entrypoint /job/script/infer_start.sh  <mindie image:tag>  <restart_times>

    You can add or delete the --device parameter to mount different numbers of processors and devices. The processor ID must be the same as that contained in the npuDeviceIds field in config.json.

  8. View container logs.
    docker logs -f <container-name>

    If the following information is displayed, the container is started successfully.

    ...
    Daemon start success!
    ...
  9. Create a terminal window and run the following commands to access the service. If the request is successfully returned, the inference service is deployed.
    curl -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    -X POST -d '{
        "model": "<model_name>", 
    "messages": [ 
            {"role": "system", "content": "you are a helpful assistant."},
            { "role": "user", "content": "How many r are in the word \"strawberry\"" } 
        ], 
        "max_tokens": 256, 
        "stream": false,
        "do_sample": true,
        "ignore_eos": true, 
        "temperature": 0.6,
        "top_p": 0.95,
        "top_k": 20,
        "stream": false }' \
    http://<ipAddress>:<port>/v1/chat/completions
    • Replace <model_name> with the value of modelName in config.json.
    • Replace <ipAddress> with the value of ipAddress in config.json.
    • Replace <port> with the value of port in config.json.
  10. Checks whether the service is automatically restarted after a fault occurs.
    1. Simulate a service fault on the node.
      # Query the process information on the NPU, including the process ID.
      npu-smi info
      # Kill the process and simulate a fault. Replace <process_id> with the process ID.
      kill -9 <process_id>
    2. View container logs.
      docker logs -f <container-name>

      If the following information is displayed, the service has been restarted successfully:

      Daemon is killing...
      ...
      [EntryPoint Script Log]running job failed. exit code: 137
      [EntryPoint Script Log]restart mindie service daemon, cur: 0, max: 1
      ...
      Daemon start success!
  11. Stop the container.
    docker stop <container-name>