Mode 3: Container Installation
You can install MindIE in containerized installation mode by referring to this section. Before that, ensure that the server can connect to the network.
Prerequisites
- The NPU driver and firmware have been installed on the host. If they have not been installed, install them as instructed in "Selecting an Installation Scenario" in CANN Software Installation Guide (Commercial Edition) or "Selecting an Installation Scenario" in CANN Software Installation Guide (Community Edition). Select an installation scenario as follows:
- Installation mode: installation on a physical machine
- OS: See Hardware Mapping and Supported OSs for the OSs supported by MindIE.
- Service scenario: training & inference & development and debugging
- The software packages to be installed have been prepared on the host as described in Software Package Preparation.
- Docker 24.x.x or later has been installed on the host.
- Before configuring the source, make sure that the installation environment can connect to the network.
- Before starting the container, ensure that Ascend Docker Runtime has been installed. If it is not installed, install it by referring to "Installing Ascend Docker Runtime in the Docker Scenario" in "Installation" > "Installation and Deployment" > "Manual Installation" > "Ascend Docker Runtime" in MindCluster Cluster Scheduling Installation Guide.
Procedure
- Pull the OS image.
docker pull ubuntu:22.04
Ubuntu 22.04 is used as an example. You can select another supported OS version, but ensure that the OS image to be pulled meets the requirements in Hardware Mapping and Supported OSs.
The apt source download path may be incorrect in a new container. You need to configure a dedicated source for Ubuntu 22.04 to improve the download speed.
The installation requires the download of related dependencies. Ensure that the installation environment can be connected to the network.
Run the following command as the root user to check whether the source is valid:
apt update
If an error is reported during command execution or dependency installation, check whether the network is connected, or replace the source in the /etc/apt/sources.list file with an available source or use an image source. (You can visit Huawei open-source image website to find more information about how to configure a Huawei image source.)
- Pull the container and mount host directories. During container installation, you do not need to install a driver in the container. You only need to mount the directories in the following example to the container based on the product type.
Start the container and modify the mounting information based on the actual product paths and requirements.
docker run -it -d --net=host --shm-size=1g \ # For multimodal understanding models, if the maximum service concurrency is high, setting --shm-size to at least 100 GB is recommended. --name <container-name> \ --device=/dev/davinci_manager:rwm \ --device=/dev/hisi_hdc:rwm \ --device=/dev/devmm_svm:rwm \ --device=/dev/davinci0:rwm \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \ -v /usr/local/Ascend/firmware/:/usr/local/Ascend/firmware:ro \ -v /usr/local/sbin:/usr/local/sbin:ro \ -v /path-to-weights:/path-to-weights:ro \ ubuntu:22.04 bash
The mount permission corresponding to the --device parameter is set to rwm instead of rw or r, which is with lower-level permission, for the following reasons:- For Atlas 800I A2 inference server, if the mount permission is set to rw, you can access the container, run the npu-smi command to view the NPU usage, and run the MindIE service. However, if the mounted NPU (corresponding to davincixxx in the mount option, for example, npu0 corresponds to davinci0) is occupied by other tasks, an error is reported when you run the npu-smi command, and the MindIE task cannot be executed (torch.npu.set_device() fails).
- For Atlas 800I A3 SuperPoD Server, if the mount permission is set to rw, an error is reported when you run the npu-smi command after accessing the container, and the MindIE task cannot be executed (torch.npu.set_device() fails).
Table 1 Parameters Parameter
Description
--shm-size=1g
Specifies the size of the shared memory (/dev/shm) of a specified container. You can set the size as required. 1g is an example value. For multimodal understanding models, if the maximum service concurrency is high, setting --shm-size to at least 100 GB is recommended.
The value cannot exceed the size of the remaining physical memory of the host. You can run the free -h command to view the size of the remaining physical memory. When data parallelism (DP) is enabled, the shared memory size (shm-size) must be adjusted proportionally as the DP value grows beyond 1.
- For a DP value of 2, set shm-size to at least 2 GB.
- For a DP value of 4, set shm-size to at least 3 GB.
- For a DP value of 8, set shm-size to at least 5 GB.
- For a DP value of 16, set shm-size to at least 9 GB.
--name
Specifies the container name. Set it as required.
--device
Indicates the mapped device. One or more devices can be mounted.
Devices to be mounted include:
- /dev/davinci_manager: Da Vinci-related management device.
- /dev/hisi_hdc: HDC-related management device.
- /dev/devmm_svm: Memory-related management device.
- /dev/davinci0: ID of the card to be mounted.
NOTE:You can run the following command to query the number and names of devices. Change --device=**** as required.
ll /dev/ | grep davinci
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro
Mounts the host directory /usr/local/Ascend/driver to the container. Change it according to the actual driver path.
-v /usr/local/sbin:/usr/local/sbin:ro
Mounts the tools required in the container.
-v /path-to-weights:/path-to-weights:ro
Mounts the directory where model weights on the host are located.
- Check whether the npu-smi tool is successfully mounted. (The default path is /usr/local/sbin/. Change it as required.)
- Run the following command to view the file list in the directory and check that the npu-smi tool exists:
ll /usr/local/sbin/
- Check the permission settings of npu-smi.Ensure that the npu-smi file has the proper execution permission. You can run the following command to change the permission:
chmod 555 /usr/local/sbin/npu-smi
- Verify the execution permission.
Run the npu-smi info command to check whether any information is displayed. If no information is displayed, check the preceding steps again.
npu-smi info
- Run the following command to view the file list in the directory and check that the npu-smi tool exists:
- Access the container.
docker exec -it <container-name> /bin/bash
- Add the .so file path under /usr/local/Ascend/driver/ to LD_LIBRARY_PATH as follows:
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
- Install the Python environment in the container. For details, see Compiling and Installing Python.
- Install the dependencies, CANN, PyTorch, and ATB Models in the container. For details, see Installing CANN to Installing ATB Models.
- Install MindIE in the container. For details, see Installing the MindIE Software Package. After the installation is complete, you can deploy MindIE services in the container.