Running the Installation Command

Before You Start

  • For batch installation, start from 3 after completing related configurations based on Function Description.
  • Before running the --install, --install-scene, and --patch options to install CANN and Toolbox, sign the Huawei Enterprise End User License Agreement (EULA). That is, enter y or Y to confirm that you have read and agreed to this agreement. Then, the installation process automatically starts.
  • Before using the ascend-deployer tool for installation, ensure that the available memory of the device where the tool is located is at least 16 GB.
  • When installing MindX DL and MEF Center, ensure that the available drive space of the Docker container, file system, or root directory in the system is greater than 30% after an extra 18 GB space (estimated size for MindX DL images and training/inference images) is used. You can run the bash ascend_deployer_path/ascend_deployer/playbooks/roles/mindx.basic/files/space.sh command to check the space. ok indicates that the check is successful, and bad indicates that the check fails.
  • If the obtained driver and firmware packages is an Ascend-hdk package (for example, Ascend-hdk-*-npu_*- {arch}.zip), you do not need to set the cus_npu_info parameter. If not, for the Atlas 300I Pro, Atlas 300V Pro, Atlas 300V, Atlas 300T (model 9000), and Atlas 300T Pro, you must configure cus_npu_info in inventory_file in advance when installing the driver and firmware. For the aforementioned products, set cus_npu_info to 300i-pro, 300v-pro, 300v, 300t, and 300t-pro, respectively. The following is an example:
    [worker]
    localhost ansible_connection='local' cus_npu_info='300i-pro'

Procedure

  1. Log in to the target device as the installation user of the software package.
  2. Upload the entire ascend-deployer directory to the home directory on the target device (for example, $HOME). Skip this step if you use the download function of ascend-deployer on the target device.
    • To use an offline deployment tool, a non-root user must have the operation permission on the ascend-deployer directory and set umask to 022 in advance. Before the setting, ensure that the umask permission meets the security requirements of your organization.
    • If the installation is performed as the root user and other users need to use the Python installed by the root user, set umask to 022 in advance. Before the setting, ensure that the umask permission meets the security requirements of your organization.
  3. Go to the ascend-deployer directory and run the installation script. (The installation user must have the execute permission on the install.sh script.) You can select the installation mode (scenario-specific installation or software-specific installation) based on the actual requirements.
    If you use pip to install ascend-deployer on the local target device, run the ascend-deployer command in any path on the local target device to install ascend-deployer. The difference between using pip and installation script is that bash install.sh in the following command is replaced with ascend-deployer, for example, ascend-deployer --install-scene=auto.
    • Scenario-specific installation (only for the root user)

      Docker is automatically installed in all scenarios to facilitate container deployment. During the installation, the corresponding Docker group is created. Before installing the dependency, check whether Docker has been installed in the system. If Docker has been installed, uninstall it and then install the system dependency.

      1. (Optional) Perform a pre-installation check. Before the installation, run the following command to check whether the installation is supported in a specific installation scenario. After passing the test, proceed with the installation operation.
        bash install.sh --install-scene=<scene_name> --check --stdout_callback=ansible_log

        ascend-deployer provides several basic installation scenarios. For details about the <scene_name>, see Optional Installation Scenarios. --stdout_callback=ansible_log is optional and is used to display plugins on the screen.

        Example command:

        bash install.sh --install-scene=auto --check    // Test if the driver, firmware, CANN, MindSpore, TensorFlow, PyTorch, and other required software can be installed.
      2. Run the installation command:
        bash install.sh --install-scene=<scene_name>

        For details about <scene_name>, see Optional Installation Scenarios.

        Example commands:

        bash install.sh --install-scene=auto      // Install the driver, firmware, CANN, MindSpore, TensorFlow, PyTorch, and other required software.
        bash install.sh --install-scene=dl        // Install the driver, firmware, MindX DL-related components, and other required software.
        bash install.sh --install-scene=mef      // Install MEF Center-related components.
      3. If CANN and Toolbox need to be installed in a specific scenario, sign Huawei Enterprise End User License Agreement (EULA) to proceed to the installation process. That is, enter y or Y to accept the agreement, or enter any other character to reject the agreement. After you accept the agreement, the installation automatically starts.
        If the current language environment does not meet the requirements, run the following command to configure the default language environment:
        # Set the language to Chinese (simplified).
        export LANG=zh_CN.UTF-8
        # Set the language to English.
        export LANG=en_US.UTF-8
      4. If the driver and firmware need to be installed, restart the system immediately after the installation is complete. To restart the single-server system, run the reboot command. For batch installation, run the following command to restart all devices.
        1. If ascend-deployer is deployed on a server to be installed, you need to mask the IP address of the local server in the inventory_file, as shown in the following information in bold. Otherwise, during execution of step 3.d.iii, the local server may have been restarted before the restart command is sent to other servers. As a result, other servers are not restarted. If ascend-deployer is deployed on a general-purpose server, skip this step.
          #ip_address of the local server ansible_ssh_user="root" # Mask the IP address of the local server.
        2. Configure the Ansible environment variables.
          export PATH=$PATH:/root/.local/python3.7.5/bin
          export LD_LIBRARY_PATH=/root/.local/python3.7.5/lib:$LD_LIBRARY_PATH
        3. Restart the servers.
          ansible -i inventory_file all -m shell -a 'reboot'
        4. After other servers to be installed are restarted, unmask the local IP address in inventory_file and run the reboot command to restart the local server.
    • Software-specific installation
      The root user can install all the software downloaded using ascend-deployer tool. The non-root user can only install the software listed in Table 1. Install sys_pkg (system component), NPU (driver and firmware, for Ascend devices), and other required software as the root user, and then install the software listed in Table 1 as the non-root user.
      Table 1 Software list

      Software

      Description

      Python, GCC

      Python 3.x.x and GCC 7.3.0 are installed in the $HOME/.local/ directory.

      Framework

      It contains TensorFlow, PyTorch, and MindSpore.

      CANN software

      It contains NNAE, NNRT, TFPlugin, Toolkit, and kernels.

      • The root user installs them in the /usr/local/Ascend directory.
      • A common user installs them in the $HOME/Ascend directory.

      toolbox

      • The root user installs it in the /usr/local/Ascend directory.
      • A common user installs it in the $HOME/Ascend directory.
      • If the Toolbox of a version earlier than MindX 3.0.0 is installed by a non-root user, Ascend Docker Runtime will not be installed. If you need to use Ascend Docker Runtime, install the Toolbox as the root user.

      MindStudio

      • It is installed in the $HOME directory.
      • If a non-root user needs to install MindStudio, install MindStudio as the root user (some dependencies need to be installed by the root user) and then install MindStudio as the non-root user.

      MindX DL

      • If Kubernetes, Docker, driver, and firmware are available, you can install the following components: Ascend Device Plugin, Ascend Docker Runtime, HCCL-Controller, NodeD, NPU-Exporter, Volcano, Ascend Operator, and Resilience-Controller.

      MindIO

      The root user installs it in the /usr/local/Ascend directory.

      1. (Optional) Perform a pre-installation check. Before the installation, run the following command to check whether the installation is supported based on the software package to be installed. After passing the test, proceed with the installation operation.
        bash install.sh --install=<package_name> --check --stdout_callback=ansible_log

        You can run the bash install.sh --help command to view the value range of <package_name>. --stdout_callback=ansible_log is optional and is used to display plugins on the screen. Example command:

        bash install.sh --install=toolkit --check --stdout_callback=ansible_log
      2. Run the installation command:
        bash install.sh --install=<package_name_1>,<package_name_2>

        You can run the bash install.sh --help command to view the value range of <package_name_x>. Example commands:

        bash install.sh --install=sys_pkg,python,npu     // Install system components, Python, driver, and firmware.
        bash install.sh --install=toolkit    // Install the Toolkit.
        bash install.sh --install=kernels    // Install kernels.
        bash install.sh --install=tfplugin    // Install TFPlugin.
        bash install.sh --install=tensorflow    // Install TensorFlow.
        bash install.sh --install=pytorch      // Install PyTorch.
        bash install.sh --install=mindspore      // Install MindSpore.
        bash install.sh --install=ief         // Install the IEF agent.
        bash install.sh --install=mindstudio      // Install MindStudio.
        bash install.sh --install=ascend-device-plugin         // Install Ascend Device Plugin.
        bash install.sh --install=ascend-docker-runtime         // Install Ascend Docker Runtime.
        bash install.sh --install=mindio    // Install MindIO.
         
        • During the installation, run the date -s command to calibrate the time in the operating environment to the correct UTC time.
        • Install the software packages in the sequence of "sys_pkg > Python > NPU > CANN (including the Toolkit and NNRT), MindX DL, and MindIO > AI framework (TensorFlow, MindSpore, or PyTorch)". During the installation, ensure that the CANN package version in the resources directory matches the NPU version.
        • TensorFlow 2.6.5 has vulnerabilities. For details about how to handle the vulnerabilities, see the Vulnerabilities and Fixing Solutions.
        • By default, kernels are installed in the Toolkit of the same version. If the Toolkit is not installed, kernels are installed in NNAE. If neither the Toolkit nor NNAE is installed, kernels are not installed by default. The installation path (using installation along with the Toolkit as an example) is software_package_installation_path/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/kernel.
        • During Toolkit installation, the HCCL performance tester is automatically compiled and installed. The installation path is software_package_installation_path/ascend-toolkit/latest/tools/hccl_test.
        • If the ascend-deployer tool cannot run due to environment variables, you need to configure ASCENDPATH as required.
        • If the GCC version is earlier than 7.3.0, ascend-deployer will be automatically installed, which takes a long time. You can manually upgrade GCC and configure environment variables in advance. After installing GCC 7.3.0, you need to configure environment variables or establish a soft link to use it. For details, see Configure GCC environment variables.
        • The device health status is obtained before the NPU is installed. If the device is faulty, the installation stops.
        • After the NPU is installed, determine whether to reboot the system based on the system prompts. If you need to reboot the system, run the reboot command.
        • Some components involve runtime dependencies. For example, PyTorch requires Toolkit or NNAE to provide runtime dependencies, TensorFlow depends on the runtime dependencies provided by TFPlugin + Toolkit or TFPlugin + NNAE when invoking NPU resources, and MindSpore requires the driver and Toolkit to provide runtime dependencies.
        • Ensure that Python has been installed before installing Python libraries, such as TensorFlow, MindSpore, and PyTorch.
        • If --install= is set to mindspore, the MindSpore package downloaded in Performing Online Download will be installed. You can also install it by following the instructions on the MindSpore official website. Pay attention to the version mapping between MindSpore, drivers, firmware, and CANN software.
        • After the IEF agent has been installed, log in to the IEF console and choose Edge Resources > Edge Nodes in the navigation pane to view the status of managed edge nodes in the edge node list. If the node status is Running, the management is successful.
      3. If CANN and Toolbox need to be installed in a specific scenario, sign Huawei Enterprise End User License Agreement (EULA) to proceed to the installation process. That is, enter y or Y to accept the agreement, or enter any other character to reject the agreement. After you accept the agreement, the installation automatically starts.
        If the current language environment does not meet the requirements, run the following command to configure the default language environment:
        # Set the language to Chinese (simplified).
        export LANG=zh_CN.UTF-8
        # Set the language to English.
        export LANG=en_US.UTF-8
      4. If the driver and firmware need to be installed, restart the system immediately after the installation is complete. To restart the single-server system, run the reboot command. For batch installation, run the following command to restart all devices.
        1. If ascend-deployer is deployed on a server to be installed, you need to mask the IP address of the local server in the inventory_file, as shown in the following information in bold. Otherwise, during execution of step 3.d.iii, the local server may have been restarted before the restart command is sent to other servers. As a result, other servers are not restarted. If ascend-deployer is deployed on a general-purpose server, skip this step.
          #ip_address of the local server ansible_ssh_user="root" # Mask the IP address of the local server.
        2. Configure the Ansible environment variables.
          export PATH=$PATH:/root/.local/python3.7.5/bin
          export LD_LIBRARY_PATH=/root/.local/python3.7.5/lib:$LD_LIBRARY_PATH
        3. Restart the servers.
          ansible -i inventory_file all -m shell -a 'reboot'
        4. After other servers to be installed are restarted, unmask the local IP address in inventory_file and run the reboot command to restart the local server.
  4. After the installation is complete, the report directory is generated in the current path. It contains the installation reports report.csv and report.json, which record the server IP addresses, driver types, installed software packages and versions, target nodes for installation, and MindX DL installation results.