Preparing an Image

You can use either of the following methods to prepare an image. After obtaining the image, create node tags, users, log directories, and namespaces for the installed components in sequence.

  • (Recommended) Creating an image: This section uses Ascend Operator as an example to describe how to create an image required for deploying a cluster scheduling component container. The Dockerfile in the software package is for reference only. You can customize an image based on this example.
  • After an image is pulled or created, perform security hardening in a timely manner, such as fixing vulnerabilities of the base image and installing third-party dependencies.
  • Import the image to the container runtime used by Kubernetes. For example, if Kubernetes 1.24 or later uses containerd as the container runtime by default, you need to import the pulled or created image to containerd.
  • The running user of NPU Exporter and Ascend Device Plugin is root. The LD_LIBRARY_PATH environment variable is configured in the corresponding Dockerfile, and its value contains the path of the driver library. Files contained in the library will be used during component running. You are advised to specify the running user as root during driver installation to prevent privilege escalation risks caused by user inconsistency.

Creating an Image

  1. Obtain the cluster scheduling component package to be installed in Obtaining Software Packages.
  2. Decompress the software package and upload it to any directory on the image creation server. Take Ascend Operator as an example. Save the package to the /home/ascend-operator directory. The directory structure is as follows:
    root@node:/home/ascend-operator# ll
    total 41388
    drwxr-xr-x 2 root root     4096 Aug 26 20:20 ./
    drwxr-xr-x 6 root root     4096 Aug 26 20:20 ../
    -r-x------ 1 root root 41992192 Aug 26 02:02 ascend-operator*
    -r-------- 1 root root   372291 Aug 26 02:02 ascend-operator-v{version}.yaml
    -r-------- 1 root root      482 Aug 26 02:02 Dockerfile
    To deploy NPU Exporter and Ascend Device Plugin in containerized mode on an Atlas 200I SoC A1 core board, perform the following operations:
    1. When creating an image, check UIDs and GIDs of the HwHiAiUser, HwDmUser, and HwBaseUser users on the host and record their values.
    2. Check whether the GID and UID values specified when the HwHiAiUser, HwDmUser, and HwBaseUser users are created in Dockerfile-310P-1usoc are the same as those on the host. If they are the same, do not modify them. If they are different, manually modify Dockerfile-310P-1usoc to ensure that they are consistent. In addition, ensure that the values of GID and UID of the HwHiAiUser, HwDmUser, and HwBaseUser users on each host are the same.
  3. Check whether the following base images exist on the node where images of the cluster scheduling components are created.
    • Run the docker images | grep ubuntu command to check the Ubuntu image. The image size of the ARM architecture is different from that of the x86_64 architecture.
      ubuntu              22.04               6526a1858e5d        2 years ago         64.2MB
    • If Volcano needs to be installed, check whether the alpine image exists. Run the docker images | grep alpine command. Note that the image sizes of the ARM and x86_64 architectures are different.
      1
      alpine            latest              a24bb4013296        2 years ago         5.57MB
      

    If the preceding base images do not exist, use commands in Table 1 to pull the base images. (To pull images, ensure that the server can connect to the Internet.)

    Table 1 Commands for obtaining base images

    Base Image

    Image Pulling Command

    Description

    ubuntu:22.04

    docker pull ubuntu:22.04

    The system architecture is automatically identified during image pulling.

    alpine:latest

    • x86_64 architecture
      docker pull alpine:latest
    • ARM architecture
      docker pull arm64v8/alpine:latest
      docker tag arm64v8/alpine:latest alpine:latest

    -

  4. Go to the extracted component directories one by one and run the docker build commands to create images. For details about the commands, see Table 2.
    Table 2 Commands for creating component images

    Product

    Component

    Image Creation Command

    Description

    Other products

    Ascend Device Plugin

    docker build --no-cache -t ascend-k8sdeviceplugin:{tag} ./

    {tag} must be consistent with the software package version. For example, if the software package version is 7.3.0, the value of {tag} is v7.3.0.

    NOTE:

    Ensure that GID and UID of HwDmUser and HwBaseUser in Dockerfile-310P-1usoc are the same as those on the physical machine.

    Atlas 200I SoC A1 core board

    docker build --no-cache -t ascend-k8sdeviceplugin:{tag} -f Dockerfile-310P-1usoc ./

    Other products

    NPU Exporter

    docker build --no-cache -t npu-exporter:{tag} ./

    Atlas 200I SoC A1 core board

    docker build --no-cache -t npu-exporter:{tag} -f Dockerfile-310P-1usoc ./

    Other products

    Ascend Operator

    docker build --no-cache -t ascend-operator:{tag} ./

    Resilience Controller

    docker build --no-cache -t resilience-controller:{tag} ./

    NodeD

    docker build --no-cache -t noded:{tag} ./

    ClusterD

    docker build --no-cache -t clusterd:{tag} ./

    Volcano

    Go to the directory where Volcano is decompressed based on the following version:

    • For v1.7.0, run the following commands:
      docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f ./Dockerfile-scheduler
      docker build --no-cache -t volcanosh/vc-controller-manager:v1.7.0 ./ -f ./Dockerfile-controller
    • For v1.9.0, run the following commands:
      docker build --no-cache -t volcanosh/vc-scheduler:v1.9.0 ./ -f ./Dockerfile-scheduler
      docker build --no-cache -t volcanosh/vc-controller-manager:v1.9.0 ./ -f ./Dockerfile-controller

    -

    The following uses Ascend Operator as an example to describe how to create an image by running the docker build --no-cache -t ascend-operator:v{version} . command. The command output is as follows:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
                Install the buildx component to build images with BuildKit:
                https://docs.docker.com/go/buildx/
    Sending build context to Docker daemon  42.37MB
    Step 1/5 : FROM ubuntu:22.04 as build
     ---> 1f37bb13f08a
    Step 2/5 : RUN useradd -d /home/hwMindX -u 9000 -m -s /usr/sbin/nologin hwMindX &&     usermod root -s /usr/sbin/nologin
     ---> Running in d43f1927b1fd
    Removing intermediate container d43f1927b1fd
     ---> 9f1d64e06ee6
    Step 3/5 : COPY ./ascend-operator  /usr/local/bin/
     ---> 5022b58c516e
    Step 4/5 : RUN chown -R hwMindX:hwMindX /usr/local/bin/ascend-operator  &&    chmod 500 /usr/local/bin/ascend-operator &&    chmod 750 /home/hwMindX &&    echo 'umask 027' >> /etc/profile &&     echo 'source /etc/profile' >> /home/hwMindX/.bashrc
     ---> Running in a781bde3dc56
    Removing intermediate container a781bde3dc56
     ---> 3d7e2ee7a3bd
    Step 5/5 : USER hwMindX
     ---> Running in 338954be8d99
    Removing intermediate container 338954be8d99
     ---> 103f6a2b43a5
    Successfully built 103f6a2b43a5
    Successfully tagged ascend-operator:v{version}
    
  5. Skip this step if the following conditions are met:
    • The created images of cluster scheduling components have been uploaded to the private image repository. Each node can pull the images from the private image repository.
    • The component image has been created on each node where cluster scheduling components are installed.
    If the preceding conditions are not met, you need to manually distribute component images to each node. The following uses NodeD as an example to describe how to distribute images to the target node using an offline image package.
    1. Save the created image as an offline image.
      docker save noded:v{version} > noded-v{version}-linux-aarch64.tar
    2. Copy the image to the target node.
      scp noded-v{version}-linux-aarch64.tar root@{IP_address_of_the_target_node}:Storage_path
    3. Log in to each node as the root user to load the offline image.
      docker load < noded-v{version}-linux-aarch64.tar
  6. (Optional) Import the offline image to containerd. This step applies only to the scenario where containerd is used as the container runtime.

    The following uses NodeD as an example to describe how to import the offline image.

    ctr -n k8s.io images import noded-v{version}-linux-aarch64.tar

Pulling an Image from the Ascend Image Repository

  1. Ensure that the server can access the Internet and access the Ascend image repository.
  2. In the navigation tree on the left, choose Cluster Scheduling and select the component image according to the following table. The pulled image can be deployed using the component startup YAML file only after it is renamed. For details, see Step 3.
    Table 3 Image list

    Component

    Image Name

    Image Tag

    Node from Which Images Are Pulled

    Volcano

    Select an image as required:

    v1.7.0-v7.3.0

    v1.9.0-v7.3.0

    Management node

    Ascend Operator

    ascend-operator

    v7.3.0

    ClusterD

    clusterd

    v7.3.0

    NodeD

    noded

    v7.3.0

    Compute node

    NPU Exporter

    npu-exporter

    v7.3.0

    Ascend Device Plugin

    ascend-k8sdeviceplugin

    v7.3.0

    If you do not have the download permission, apply for the permission as prompted. After your application is approved by the administrator, you can download the images.

  3. If the name of the cluster scheduling component image pulled from the Ascend image repository is different from that in the component startup YAML file, rename the pulled image before starting it. Perform the following steps to rename the image obtained in 2. You are advised to delete the image with the original name. Detailed operations are as follows:
    1. Rename the image (select a corresponding command based on the component in use).
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-operator:v7.3.0 ascend-operator:v7.3.0
      
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/npu-exporter:v7.3.0 npu-exporter:v7.3.0
      
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v7.3.0 ascend-k8sdeviceplugin:v7.3.0
      
      # Change the image tag to v1.9.0-v7.3.0 if Volcano 1.9.0 is required.
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-controller-manager:v1.7.0-v7.3.0 volcanosh/vc-controller-manager:v1.7.0
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-scheduler:v1.7.0-v7.3.0 volcanosh/vc-scheduler:v1.7.0
      
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:v7.3.0 noded:v7.3.0
      
      docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/clusterd:v7.3.0 clusterd:v7.3.0
    2. (Optional) Delete the original image (select a corresponding command based on the component in use).
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-operator:v7.3.0
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/npu-exporter:v7.3.0
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v7.3.0
      # Change the image tag to v1.9.0-v7.3.0 if Volcano 1.9.0 is required.
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-controller-manager:v1.7.0-v7.3.0
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-scheduler:v1.7.0-v7.3.0
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:v7.3.0
      docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/clusterd:v7.3.0
  4. (Optional) Import the offline image to containerd. This step applies only to the scenario where containerd is used as the container runtime.

    The following uses NodeD as an example to describe how to import the offline image.

    1. Save the created image as an offline image.
      docker save noded:v{version} > noded-v{version}-linux-aarch64.tar
    2. Import the offline image to containerd.
      ctr -n k8s.io images import noded-v{version}-linux-aarch64.tar