Mode 1: Image Installation

You can install the MindIE container image by referring to this section. Before that, ensure that the server can connect to the network.

The MindIE image provided in the Ascend community image repository can be directly obtained and used. However, the root user is used by default, which may cause security risks. Therefore, the image is used only for development and debugging. For commercial use, prepare the image by yourself.

Prerequisites

  • Ensure that the NPU driver and firmware have been installed on the host. If they have not been installed, install them as instructed in "Selecting an Installation Scenario" in CANN Software Installation Guide (Commercial Edition) or "Selecting an Installation Scenario" in CANN Software Installation Guide (Community Edition). Select an installation scenario as follows and refer to the section "Installing the NPU Driver and Firmware".
    • Installation mode: installation on a physical machine
    • OS: See Hardware Mapping and Supported OSs for the OSs supported by MindIE.
    • Service scenario: training & inference & development and debugging
  • Docker 24.x.x or later has been installed on the host.
  • Before configuring the source, make sure that the installation environment can connect to the network.

Obtaining the MindIE Container Image

  1. Visit the Ascend image repository, where you can download the MindIE image.
  2. Click the login button in the upper right corner of the page and log in with your Huawei account. (If you do not have a Huawei account, register one first.)
  3. On the Image Version tab page of the MindIE image download page, find the image to download based on your device model and click Download (which is in the Operation column) on the right of the image you are going to download.
  4. Download the image according to the displayed image download guide.

Using an Image

  1. Run the following command to start the container. The command is for reference only. You can modify the command as required. For details about the command parameters, see Table 1.
    docker run -it -d --net=host --shm-size=1g \ # For multimodal understanding models, if the maximum service concurrency is high, setting --shm-size to at least 100 GB is recommended.
        --name <container-name> \
        --device=/dev/davinci_manager:rwm \
        --device=/dev/hisi_hdc:rwm \
        --device=/dev/devmm_svm:rwm \
        --device=/dev/davinci0:rwm \
        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
        -v /usr/local/Ascend/firmware/:/usr/local/Ascend/firmware:ro \
        -v /usr/local/sbin:/usr/local/sbin:ro \
        -v /path-to-weights:/path-to-weights:ro \
        mindie:2.3.0-800I-A2-py311-openeuler24.03-lts bash
    • mindie:2.3.0-800I-A2-py311-openeuler24.03-lts is the image name and label, which can be changed as required. You can run the following command on the host to view the existing images on the host:
      docker images
    • The mount permission corresponding to the --device parameter is set to rwm instead of rw or r, which is with lower-level permission, for the following reasons:
      • For Atlas 800I A2 inference server, if the mount permission is set to rw, you can access the container, run the npu-smi command to view the NPU usage, and run the MindIE service. However, if the mounted NPU (corresponding to davincixxx in the mount option, for example, npu0 corresponds to davinci0) is occupied by other tasks, an error is reported when you run the npu-smi command, and the MindIE task cannot be executed (torch.npu.set_device() fails).
      • For Atlas 800I A3 SuperPoD Server, if the mount permission is set to rw, an error is reported when you run the npu-smi command after accessing the container, and the MindIE task cannot be executed (torch.npu.set_device() fails).
    Table 1 Parameter description

    Parameter

    Description

    -it

    Starts an interactive terminal (-i) and connects it to the standard input and output of the container (-t). In this way, the terminal can interact with the container, for example, running commands.

    -d

    Indicates that the container runs in the background. That is, the container is started in the background. After this parameter is used, the operations on the current terminal are not blocked. You can perform other operations after the container is started.

    --net

    Indicates that the container uses the network configuration (network sharing) of the host so that the container can directly access the network interface of the host. This parameter applies to scenarios where low latency and direct access to network resources are required.

    --shm-size

    Specifies the size of the shared memory (/dev/shm) of a specified container. You can set the size as required. 1g is an example value. For multimodal understanding models, if the maximum service concurrency is high, setting this parameter to at least 100 GB is recommended.

    The value cannot exceed the size of the remaining physical memory of the host. You can run the free -h command to view the size of the remaining physical memory. When data parallelism (DP) is enabled, the shared memory size (shm-size) must be adjusted proportionally as the DP value grows beyond 1.

    • For a DP value of 2, set shm-size to at least 2 GB.
    • For a DP value of 4, set shm-size to at least 3 GB.
    • For a DP value of 8, set shm-size to at least 5 GB.
    • For a DP value of 16, set shm-size to at least 9 GB.

    --name

    Assigns a name to the container. <container-name> is the identifier of the container. You can set it as required. It must be unique in the current system. If this parameter is not set, Docker automatically assigns a random name.

    --device

    Maps the devices of the host to the container. Each --device parameter shares the host device (such as a hardware accelerator card or other hardware devices) with the container so that the container can directly access the device.

    • /dev/davinci_manager: Da Vinci-related management device.
    • /dev/hisi_hdc: HDC-related management device.
    • /dev/devmm_svm: Memory-related management device.
    • /dev/davinciX: NPU device. X indicates the ID, for example, davinci0.
    NOTE:

    You can run the following command to query the number and names of devices. Change --device=**** as required.

    ll /dev/ | grep davinci

    -v

    Maps folders on the physical machine to the corresponding directories in the container and sets the directories to read-only using the ro parameter.

    • /usr/local/Ascend/driver: This path contains the hardware driver file. The driver is installed on the host and can be used in the container only after being mapped to the container. Change it according to the actual path of the driver.
    • /usr/local/sbin: This path contains the NPU status query commands such as npu-smi. Change it according to the actual path.
    • /path-to-weights: This path specifies where the weights are mounted, directing to the directory storing the weights so that the container can access them. Change it according to the actual path. (Place both the weight file and dataset file in this path.)
  2. Access the container.
    docker exec -it <container-name> bash
  3. Install the dependencies.

    Before using a model for inference, you need to install the corresponding dependencies. The dependency installation file (requirements_ xxx.txt) of each model is stored in /usr/local/Ascend/atb-models/requirements/models. Take the LLaMA3 series models as an example. Run the following commands to install the dependencies:

    cd /usr/local/Ascend/atb-models/requirements/models
    pip3 install -r requirements_llama3.txt
  4. Enable MindIE log printing.
    export MINDIE_LOG_TO_STDOUT="true"
  5. Conduct inference using a model.

    The LLaMA3 series models are used as an example. For details, see $ATB_SPEED_HOME_PATH/examples/models/llama3/README.md in the container. In the case of other models, see Model List.

    Run the following commands to perform inference:

    cd $ATB_SPEED_HOME_PATH
    python examples/run_pa.py --model_path /path-to-weights # Change the weight path.

    The default inference result is printed in an "Answer" as follows:

    2024-11-18 11:08:13,291 [INFO] [pid: 389497] logging.py-180: Answer[0]:  Deep learning is a subset of machine learning that uses neural networks to learn from data. Neural networks are
    2024-11-18 11:08:13,291 [INFO] [pid: 389497] logging.py-180: Generate[0] token num: (0, 20)

    If you want to customize an input question, set --input_texts. For example:

    python examples/run_pa.py --model_path /path-to-weights --input_texts "What is deep learning?"  # Change the weight path.

    $ATB_SPEED_HOME_PATH has been set in the .bashrc file. You do not need to modify it.

  6. Start the service.

    MindIE Motor is an inference serving framework designed for general-purpose models, establishing an adaptable and open inference service structure. It interfaces with prominent industry inference frameworks, meeting the high-performance inference needs of LLMs. For details, see MindIE Motor Development Guide.

    Simple startup methods:

    1. Modify $MIES_INSTALL_PATH/conf/config.json. For details about the parameter meanings and configuration rules, see "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.

    2. Start the service in background process mode.

    cd $MIES_INSTALL_PATH
    nohup ./bin/mindieservice_daemon > output.log 2>&1 &

    3. If the following information is printed in the file captured by the standard output stream, the startup is successful:

    Daemon start success!

    $MIES_INSTALL_PATH has been set in the .bashrc file. You do not need to modify it.