Obtaining Software Packages

Refer to Downloading Software Packages to download software packages and refer to Source Code of Open-Source Software to obtain corresponding software source code.

Downloading Software Packages

Once you download the software, you agree to the terms and conditions of Huawei Enterprise End User License Agreement (EULA).

{version} indicates the software version, and {arch} indicates the CPU architecture.

Table 1 Packages of components

Component

File List

Description

How to Obtain

Ascend Docker Runtime

ascend-docker-cli

Executable program required for running Ascend Docker Runtime. You are not advised to run it directly.

Link

ascend-docker-destroy

ascend-docker-hook

ascend-docker-plugin-install-helper

ascend-docker-runtime

assets

Image resource of a document

base.list*

Default mount list. During installation, the program installs different mount lists based on install-type.

run_main.sh

Installation script. You are not advised to use it directly.

uninstall.sh

Uninstallation script. You are not advised to use it directly.

README.md

Ascend Docker Runtime description, including design principles.

NPU Exporter

npu-exporter

NPU Exporter binary file

Link

Dockerfile

Image build text file of NPU Exporter

Dockerfile-310P-1usoc

Image build text file of NPU Exporter on Atlas 200I SoC A1 core boards

run_for_310P_1usoc.sh

Script for starting the component in the NPU Exporter image on Atlas 200I SoC A1 core boards

npu-exporter-v{version}.yaml

NPU Exporter startup configuration file

npu-exporter-310P-1usoc-v{version}.yaml

NPU Exporter startup configuration file on an Atlas 200I SoC A1 core board

metricConfiguration.json

Default configuration file for metric groups

pluginConfiguration.json

Custom configuration file for metric groups

Ascend Device Plugin

device-plugin

Ascend Device Plugin binary file

Link

Dockerfile

Image build text file of Ascend Device Plugin

Dockerfile-310P-1usoc

Image build text file of Ascend Device Plugin on Atlas 200I SoC A1 core boards

run_for_310P_1usoc.sh

Script for starting the component in the Ascend Device Plugin image on Atlas 200I SoC A1 core boards

faultCode.json

Mapping between processor fault codes and fault recovery modes.

NOTICE:

This is the system configuration file. Do not modify it unless necessary. Otherwise, errors may occur during system troubleshooting.

SwitchFaultCode.json

Mapping between interconnect device fault codes and fault recovery modes.

NOTICE:

This is the system configuration file. Do not modify it unless necessary. Otherwise, errors may occur during system troubleshooting.

faultCustomization.json

Default configuration file of the processor fault frequency and duration.

NOTICE:

This is the system configuration file. Do not modify it unless necessary. Otherwise, errors may occur during system troubleshooting.

deviceNameCustomization.json

Custom device name configuration file

NOTICE:

This is the system configuration file. Do not modify it unless necessary. Otherwise, errors may occur during system troubleshooting or device management.

device-plugin-310-v{version}.yaml

Configuration file used when Volcano is not used on an inference server (equipped with Atlas 300I inference cards).

device-plugin-310-volcano-v{version}.yaml

Configuration file used when Volcano is used on an inference server (equipped with Atlas 300I inference cards).

device-plugin-310P-v{version}.yaml

Configuration file used when Volcano is not used on Atlas inference product

device-plugin-310P-volcano-v{version}.yaml

Configuration file used when Volcano is used on Atlas inference product

device-plugin-310P-1usoc-v{version}.yaml

Configuration file used when Volcano is not used on Atlas 200I SoC A1 core boards

device-plugin-310P-1usoc-volcano-v{version}.yaml

Configuration file used when Volcano is used on Atlas 200I SoC A1 core boards

device-plugin-910-v{version}.yaml

Configuration file used when Volcano is not used on Atlas training product or Atlas A2 training product

device-plugin-volcano-v{version}.yaml

Configuration file used when Volcano is used on Atlas training product or Atlas A2 training product

Volcano

volcano-npu_{version}_linux-{arch}.so

Dynamic link library (DLL) for Volcano (Huawei-developed NPU scheduling plugin)

Link

Dockerfile-scheduler

Image build text file of volcano-scheduler

Dockerfile-controller

Image build text file of volcano-controller

volcano-v{version}.yaml

Volcano startup configuration file

vc-scheduler

volcano-scheduler binary file

vc-controller-manager

volcano-controller binary file

NOTE:

Select a proper version based on the compatibility between Kubernetes and open-source Volcano. For details, see Kubernetes compatibility on the Volcano official website.

  • The Kubernetes version compatible with Volcano v1.7.0 ranges from 1.19.x to 1.28.x.
  • The Kubernetes version compatible with Volcano v1.9.0 ranges from 1.21.x to 1.28.x.

Ascend Operator

ascend-operator

Ascend Operator binary file

Link

Dockerfile

Image build text file of Ascend Operator

ascend-operator-v{version}.yaml

Ascend Operator startup configuration file

NodeD

noded

NodeD binary file

Link

noded-v{version}.yaml

NodeD startup configuration file

noded-dpc-v{version}.yaml

To use DPC fault detection, use this configuration file to start NodeD.

NodeDConfiguration.json

Mapping between hardware fault codes and fault recovery modes

pingmesh-config.yaml

pingmesh configuration file

fdConfig.yaml

Fault diagnosis configuration file

Dockerfile

Image build text file of NodeD

ClusterD

clusterd

ClusterD binary file

Link

clusterd-v{version}.yaml

ClusterD startup configuration file

fdConfig.yaml

Fault diagnosis configuration file

Dockerfile

Image build text file of ClusterD

faultDuration.json

Configuration file for fault handling duration

relationFaultCustomization.json

Configuration file for fault handling policies

publicFaultConfiguration.json

Configuration file for public faults

TaskD

taskd-{version}-py3-none-linux_{arch}.whl

Binary file of the resumable training feature

Link

Resilience Controller

resilience-controller

Resilience Controller binary file

Link

cert-importer

Binary file of the certificate import tool

Dockerfile

Image build text file of Resilience Controller

resilience-controller-v{version}.yaml

Resilience Controller startup configuration file (KubeConfig file not required)

resilience-controller-without-token-v{version}.yaml

Resilience Controller startup configuration file (KubeConfig file required)

lib

Dynamic library file on which the encryption component depends

Elastic Agent

mindx_elastic-{version}-py3-none-linux_{arch}.whl

Binary file of the resumable training feature

Container Manager

container-manager

Container Manager binary file

Link

Resilience Controller and Elastic Agent of version 7.3.0 have reached the end of life. Obtain packages of versions earlier than 7.3.0.

Verifying the Digital Signature

To avoid using a software package that has been tampered with during transmission or storage, download its digital signature file for integrity check while downloading the software package.

After the software package is downloaded from the Support website, verify its PGP digital signature by referring to the OpenPGP Signature Verification Guide. If the software package fails the verification, do not use the software package, and contact Huawei technical support.

The verification is also required before the installation or update of the software package.

For carriers, visit https://support.huawei.com/carrier/digitalSignatureAction.

For enterprises, visit https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054.

Source Code of Open-Source Software

Cluster scheduling involves the open-source components of Ascend Docker Runtime, NPU Exporter, Ascend Device Plugin, Volcano, Ascend Operator, NodeD, and ClusterD. If you want to know the source code or customize a component, you can obtain the source code of each component by referring to Table 2.