Creating a Container Image Using a Dockerfile (PyTorch)

Prerequisites

Obtain the software packages of the corresponding OS and the Dockerfile and script files required for packaging images by referring to Table 1.

In the software package name, {version} indicates the version number, {arch} indicates the architecture, and {chip_type} indicates the processor type. In CANN 6.3.RC3, 6.2.RC3, and later versions, the message "Do you accept EULA to install CANN (Y/N)" is added to the software package. In the Dockerfile compilation example, the installation command contains the --quiet parameter by default, indicating that EULA is signed by default. You can modify the parameter as required.

Table 1 Required software

Package

Description

How to Obtain

Ascend-cann-toolkit_{version}_linux-{arch}.run

CANN ToolKit package

Link

Ascend-cann-{chip_type}-ops_{version}_linux-{arch}.run

CANN operator package.

For versions earlier than CANN 8.5.0, the package name is Ascend-cann-kernels-{chip_type}_{version}_linux-{arch}.run.

Link

apex-0.1+ascend-cp3x-cp3x-linux_{arch}.whl

Mixed precision module. x can be 8, 9, 10, or 11. Currently, Python 3.8, Python 3.9, Python 3.10, and Python 3.11 are supported.

Compile the Apex software package as required.

  • x86_64: torch-v{version}+cpu-cp3x-cp3x-linux_x86_64.whl
  • ARM: torch-v{version}-cp3x-cp3x-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Official PyTorch package.

x can be 8, 9, 10, or 11. Currently, Python 3.8, Python 3.9, Python 3.10, and Python 3.11 are supported.

{version} indicates the PyTorch version. Currently, PyTorch 2.1.0 to 2.7.1 are supported.

Link

NOTE:

Select a proper PyTorch version as required.

torch_npu-v{version}.post{version}-cp3x-cp3x-manylinux_2_17_{arch}.manylinux2014_{arch}.whl

Ascend Extension for PyTorch plugin. Python x can be 8, 9, 10, or 11. Currently, Python 3.8, Python 3.9, Python 3.10, and Python 3.11 are supported.

Link

NOTE:
  • Select a torch_npu version that matches PyTorch.
  • The PyTorch model in the MindSpeed-LLM code repository requires Ascend Extension for PyTorch 2.1.0 or later.

Dockerfile

Required for creating an image.

For details, see Dockerfile compilation example.

dllogger-master

PyTorch log tool

Link

ascend_install.info

Driver installation information file.

Copy the /etc/ascend_install.info file from the host.

version.info

Driver version information file.

Copy the /usr/local/Ascend/driver/version.info file from the host.

prebuild.sh

Script used to prepare for the setup of the training operating environment, for example, configuring the proxy.

For details, see Step 3.

install_ascend_pkgs.sh

Script for installing the Ascend software package.

For details, see Step 4.

postbuild.sh

Script for deleting the installation packages, scripts, and proxy configurations that do not need to be retained in the container.

For details, see Step 5.

To avoid using software packages that have been tampered with during transmission or storage, download their digital signature files for integrity check while downloading the software packages.

After the software package is downloaded from the Support website, verify its PGP digital signature by referring to the OpenPGP Signature Verification Guide. If the verification fails, do not use the software package, and contact Huawei technical support.

The verification is also required before the installation or update of the software package.

Carriers: Visit https://support.huawei.com/carrier/digitalSignatureAction.

Enterprises: Visit https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054.

The following uses Ubuntu OS with Python 3.10 and CANN 8.5.0 as an example to describe how to build a container image using a Dockerfile. Modify the steps as required.

Procedure

  1. Upload the software packages, deep learning framework, host driver installation information file, and driver version information file to the same directory (for example, /home/test) on the server.
    • Ascend-cann-toolkit_{version}_linux-{arch}.run
    • Ascend-cann-{chip_type}-ops_{version}_linux-{arch}.run
    • apex-0.1+ascend-cp310-cp310-linux_{arch}.whl
    • torch-v{version}+cpu.cxx11.abi-cp310-cp310-linux_{arch}.whl or torch-v{version}-cp3x-cp3x-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    • torch_npu-v{version}.post{version}-cp310-cp310-manylinux_2_17_{arch}.manylinux2014_{arch}.whl
    • dllogger-master
    • ascend_install.info
    • version.info
  2. Log in to the server as the root user.
  3. Perform the following steps to prepare the prebuild.sh file:
    1. Go to the directory where the software packages are stored and run the following command to create the prebuild.sh file:
      vi prebuild.sh
    2. For details about the content to be written, see prebuild.sh compilation example. After writing the content, run the :wq command to save it. The following uses Ubuntu as an example.
  4. Perform the following steps to prepare the install_ascend_pkgs.sh file:
    1. Go to the directory where the software packages are stored and run the following command to create the install_ascend_pkgs.sh file:
      vi install_ascend_pkgs.sh
    2. For details about the content to be written, see install_ascend_pkgs.sh compilation example. After writing the content, run the :wq command to save it. The following uses Ubuntu as an example.
  5. Perform the following steps to prepare the postbuild.sh file:
    1. Go to the directory where the software packages are stored and run the following command to create the postbuild.sh file:
      vi postbuild.sh
    2. For details about the content to be written, see postbuild.sh compilation example. After writing the content, run the :wq command to save it. The following uses Ubuntu as an example.
  6. Perform the following steps to create a Dockerfile:
    1. Go to the directory where the software packages are stored and run the following command to create a Dockerfile:
      vi Dockerfile
    2. For details about the content to be written, see Dockerfile compilation example. After writing the content, run the :wq command to save the content. The following uses Ubuntu as an example.
  7. Go to the directory where the software packages are stored and run the following command to create a container image. Do not omit the period (.) at the end of the command.
    docker build -t Image name_System architecture:Image tag .

    The following table describes the command parameters.

    Table 2 Command parameters

    Parameter

    Description

    -t

    Image name.

    Image name_System architecture:Image tag

    Image name and tag. Change them based on the actual situation.

    For example:
    docker build -t test_train_arm64:v1.0 .

    If Successfully built xxx is displayed, the image has been created.

  8. After the image is created, run the following command to view the image information:
    docker images

    Command output:

    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    test_train_arm64    v1.0                d82746acd7f0        27 minutes ago      749MB

Compilation Examples

  1. Compilation example of prebuild.sh
    Compilation example of prebuild.sh for the Ubuntu ARM OS
    #!/bin/bash
    #--------------------------------------------------------------------------------
    # Use the bash syntax to write script code and prepare for the installation, for example, configuring the proxy.
    # This script will be executed before the formal creation process is started.
    #
    # Note: After this script is executed, it will not be automatically cleared. If it does not need to be retained in the image, clear it from the postbuild.sh script.
    #--------------------------------------------------------------------------------
    # DNS settings
    tee /etc/resolv.conf <<- EOF
    nameserver xxx.xxx.xxx.xxx # IP address of the DNS server. You can enter multiple IP addresses as required.
    nameserver xxx.xxx.xxx.xxx
    nameserver xxx.xxx.xxx.xxx
    EOF
    # APT proxy settings
    tee /etc/apt/apt.conf.d/80proxy <<- EOF
    Acquire::http::Proxy "http://xxx.xxx.xxx.xxx:xxx";    # IP address and port number of the HTTP proxy server.
    Acquire::https::Proxy "http://xxx.xxx.xxx.xxx:xxx";   # IP address and port number of the HTTPS proxy server.
    EOF
    chmod 777 -R /tmp
    rm /var/lib/apt/lists/*
    #APT mirror settings (The following uses Ubuntu 18.04 Arm as an example. Set the information as required.)
    tee /etc/apt/sources.list <<- EOF
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-security main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-security main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-updates main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-updates main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-proposed main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-proposed main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-backports main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-backports main restricted universe multiverse
    EOF

    Compilation example of prebuild.sh for the Ubuntu x86_64 OS

    #!/bin/bash
    #--------------------------------------------------------------------------------
    
    # Use the bash syntax to write script code and prepare for the installation, for example, configuring the proxy.
    # This script will be executed before the formal creation process is started.
    #
    # Note: After this script is executed, it will not be automatically cleared. If it does not need to be retained in the image, clear it from the postbuild.sh script.
    #--------------------------------------------------------------------------------
    # APT proxy settings
    tee /etc/apt/apt.conf.d/80proxy <<- EOF
    Acquire::http::Proxy "http://xxx.xxx.xxx.xxx:xxx";    # IP address and port number of the HTTP proxy server.
    Acquire::https::Proxy "http://xxx.xxx.xxx.xxx:xxx";   # IP address and port number of the HTTPS proxy server.
    EOF
    
    #APT mirror settings (The following uses Ubuntu 18.04 x86_64 as an example. Set the information as required.)
    tee /etc/apt/sources.list <<- EOF
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-backports main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-proposed main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-security main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-updates main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-backports main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-proposed main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-security main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-updates main multiverse restricted universe
    EOF
  2. Compilation example of install_ascend_pkgs.sh
    #--------------------------------------------------------------------------------
    # Use the bash syntax to write script code and install the Ascend software package.
    #
    # Note: After this script is executed, it will not be automatically cleared. If it does not need to be retained in the image, clear it from the postbuild.sh script.
    #--------------------------------------------------------------------------------
    umask 0022
    cp ascend_install.info /etc/
    # Copy the /usr/local/Ascend/driver/version.info file on the host to the current directory before creating the container image.
    mkdir -p /usr/local/Ascend/driver/
    cp version.info /usr/local/Ascend/driver/
    # Ascend-cann-toolkit_{version}_linux-{arch}.run
    chmod +x Ascend-cann-toolkit_{version}_linux-{arch}.run
    chmod +x Ascend-cann-{chip_type}-ops_{version}_linux-{arch}.run
    ./Ascend-cann-toolkit_{version}_linux-{arch}.run --install-path=/usr/local/Ascend/ --install --quiet
    echo y | ./Ascend-cann-{chip_type}-ops_{version}_linux-{arch}.run --install
    # After the toolkit package is installed, clear the following files. During container startup, the toolkit package is mounted by Ascend Docker.
    rm -f version.info
    rm -rf /usr/local/Ascend/driver/
  3. Compilation example of postbuild.sh
    #--------------------------------------------------------------------------------
    # Use the bash syntax to write the script code and delete the installation packages, scripts, and proxy configurations that do not need to be retained in the container.
    # This script will be run after the formal creation process ends.
    #
    # Note: After this script terminates, it is automatically cleared and will not be left in the image. The script and Working Dir are stored in /tmp.
    #--------------------------------------------------------------------------------
    rm -f ascend_install.info
    rm -f prebuild.sh
    rm -f install_ascend_pkgs.sh
    rm -f Dockerfile
    rm -f Ascend-cann-toolkit_{version}_linux-{arch}.run
    rm -f Ascend-cann-{chip_type}-ops_{version}_linux-{arch}.run
    rm -f apex-0.1+ascend-cp310-cp310-linux_{arch}.whl
    rm -f torch-v{version}+cpu.cxx11.abi-cp310-cp310-linux_{arch}.whl
    rm -f torch_npu-v{version}.post7-cp310-cp310-manylinux_2_17_{arch}.manylinux2014_{arch}.whl
    rm -f /etc/apt/apt.conf.d/80proxy
  4. Dockerfile compilation sample
    • Dockerfile example of Python 3.10 for the Ubuntu ARM OS
      FROM ubuntu:18.04
      ARG PYTORCH_PKG=torch-v{version}+cpu.cxx11.abi-cp310-cp310-linux_aarch64.whl
      ARG PYTORCH_NPU_PKG=torch_npu-v{version}.post{version}-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
      ARG APEX_PKG=apex-0.1_ascend-cp310-cp310-linux_aarch64.whl
      ARG HOST_ASCEND_BASE=/usr/local/Ascend
      ARG TOOLKIT_PATH=/usr/local/Ascend/cann
      ARG INSTALL_ASCEND_PKGS_SH=install_ascend_pkgs.sh
      ARG PREBUILD_SH=prebuild.sh
      ARG POSTBUILD_SH=postbuild.sh
      WORKDIR /tmp
      COPY . ./
      # Trigger prebuild.sh.
      RUN bash -c "test -f $PREBUILD_SH && bash $PREBUILD_SH || true"
      ENV http_proxy http://xxx.xxx.xxx.xxx:xxx
      ENV https_proxy http://xxx.xxx.xxx.xxx:xxx
      # System packages
      RUN apt update && \ 
          apt install -y --no-install-recommends curl g++ pkg-config unzip wget build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev \ 
              libblas3 liblapack3 liblapack-dev openssl libssl-dev libblas-dev gfortran libhdf5-dev libffi-dev libicu60 libxml2 \
              patch libbz2-dev llvm libncursesw5-dev xz-utils liblzma-dev m4 dos2unix libopenblas-dev libsqlite3-dev
      RUN wget https://www.python.org/ftp/python/3.10.5/Python-3.10.5.tgz
      RUN tar -zxvf Python-3.10.5.tgz && cd Python-3.10.5 && ./configure --prefix=/usr/local/python3.10.5 --enable-shared && make && make install 
      RUN ln -s /usr/local/python3.10.5/bin/python3.10 /usr/local/python3.10.5/bin/python && \
          ln -s /usr/local/python3.10.5/bin/pip3.10 /usr/local/python3.10.5/bin/pip
      # Configure the Python pip mirror.
      RUN mkdir -p ~/.pip \
      && echo '[global] \n\
      index-url=https://pypi.doubanio.com/simple/\n\
      trusted-host=pypi.doubanio.com' >> ~/.pip/pip.conf
      
      ENV LD_LIBRARY_PATH=/usr/local/python3.10.5/lib:$LD_LIBRARY_PATH
      ENV PATH=/usr/local/python3.10.5/bin:$PATH 
      ENV PYTHONPATH=/usr/local/python3.10.5/lib/python3.10/site-packages:$PYTHONPATH 
      # Python packages
      RUN pip3 install decorator && \
          pip3 install sympy && \
          pip3 install cffi && \
          pip3 install pyyaml && \
          pip3 install pathlib2 && \
          pip3 install grpcio && \
          pip3 install grpcio-tools && \
          pip3 install protobuf && \
          pip3 install scipy && \
          pip3 install requests && \
          pip3 install attrs && \
          pip3 install Pillow==9.1.0 && \
          pip3 install torchvision==0.16.0 && \
          pip3 install numpy==1.23.5 && \
          pip3 install psutil && \
          pip3 install absl-py
      
      # Create the HwHiAiUser user and owner. The values of UIDs and GIDs must be the same as those on the physical machine to avoid generating ownerless files. In the example, the user and the corresponding group are automatically created. The values of UIDs and GIDs are both 1000.
      RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser  
      # Ascend packages
      RUN umask 0022 && bash $INSTALL_ASCEND_PKGS_SH
      RUN umask 0022 && pip3 install $APEX_PKG
      RUN umask 0022 && pip3 install $PYTORCH_PKG
      RUN umask 0022 && pip3 install $PYTORCH_NPU_PKG
      RUN cd /tmp/dllogger-master/ && \  
          python3 setup.py build && \
          python3 setup.py install
      # Environment variables
      ENV HCCL_WHITELIST_DISABLE=1
      ENV PYTHONPATH=/tmp/dllogger-master
      # Create /lib64/ld-linux-aarch64.so.1.
      RUN umask 0022 && \
          if [ ! -d "/lib64" ]; \
          then \
              mkdir /lib64 && ln -sf /lib/ld-linux-aarch64.so.1 /lib64/ld-linux-aarch64.so.1; \
          fi
      ENV http_proxy ""
      ENV https_proxy ""
      # Trigger postbuild.sh.
      RUN bash -c "test -f $POSTBUILD_SH && bash $POSTBUILD_SH || true" && \
          rm $POSTBUILD_SH
    • Dockerfile example of Python 3.10 for the Ubuntu x86_64 OS
      FROM ubuntu:18.04
      ARG PYTORCH_PKG=torch-v{version}+cpu.cxx11.abi-cp310-cp310-linux_x86_64.whl
      ARG PYTORCH_NPU_PKG=torch_npu-v{version}.post{version}-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
      ARG APEX_PKG=apex-0.1_ascend-cp310-cp310-linux_x86_64.whl
      ARG HOST_ASCEND_BASE=/usr/local/Ascend
      ARG TOOLKIT_PATH=/usr/local/Ascend/cann
      ARG INSTALL_ASCEND_PKGS_SH=install_ascend_pkgs.sh
      ARG PREBUILD_SH=prebuild.sh
      ARG POSTBUILD_SH=postbuild.sh
      WORKDIR /tmp
      COPY . ./
      # Trigger prebuild.sh.
      RUN bash -c "test -f $PREBUILD_SH && bash $PREBUILD_SH || true"
      ENV http_proxy http://xxx.xxx.xxx.xxx:xxx
      ENV https_proxy http://xxx.xxx.xxx.xxx:xxx
      # System packages
      RUN apt update && \ 
          apt install -y --no-install-recommends curl g++ pkg-config unzip wget build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev \ 
              libblas3 liblapack3 liblapack-dev openssl libssl-dev libblas-dev gfortran libhdf5-dev libffi-dev libicu60 libxml2 \
              patch libbz2-dev llvm libncursesw5-dev xz-utils liblzma-dev m4 dos2unix libopenblas-dev libsqlite3-dev
      RUN wget https://www.python.org/ftp/python/3.10.5/Python-3.10.5.tgz
      RUN tar -zxvf Python-3.10.5.tgz && cd Python-3.10.5 && ./configure --prefix=/usr/local/python3.10.5 --enable-shared && make && make install 
      RUN ln -s /usr/local/python3.10.5/bin/python3.10 /usr/local/python3.10.5/bin/python && \
          ln -s /usr/local/python3.10.5/bin/pip3.10 /usr/local/python3.10.5/bin/pip
      # Configure the Python pip mirror.
      RUN mkdir -p ~/.pip \
      && echo '[global] \n\
      index-url=https://pypi.doubanio.com/simple/\n\
      trusted-host=pypi.doubanio.com' >> ~/.pip/pip.conf
      
      ENV LD_LIBRARY_PATH=/usr/local/python3.10.5/lib:$LD_LIBRARY_PATH
      ENV PATH=/usr/local/python3.10.5/bin:$PATH 
      ENV PYTHONPATH=/usr/local/python3.10.5/lib/python3.10/site-packages:$PYTHONPATH 
      # Python packages
      RUN pip3 install decorator && \
          pip3 install sympy && \
          pip3 install cffi && \
          pip3 install pyyaml && \
          pip3 install pathlib2 && \
          pip3 install grpcio && \
          pip3 install grpcio-tools && \
          pip3 install protobuf && \
          pip3 install scipy && \
          pip3 install requests && \
          pip3 install attrs && \
          pip3 install Pillow==9.1.0 && \
          pip3 install torchvision==0.16.0 && \
          pip3 install numpy==1.23.5 && \
          pip3 install psutil && \
          pip3 install absl-py
      
      # Create the HwHiAiUser user and owner. The values of UIDs and GIDs must be the same as those on the physical machine to avoid generating ownerless files. In the example, the user and the corresponding group are automatically created. The values of UIDs and GIDs are both 1000.
      RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser  
      # Ascend packages
      RUN bash $INSTALL_ASCEND_PKGS_SH
      RUN pip3 install $APEX_PKG
      RUN pip3 install $PYTORCH_PKG
      RUN pip3 install $PYTORCH_NPU_PKG
      RUN cd /tmp/dllogger-master/ && \  
          python3 setup.py build && \
          python3 setup.py install
      # Environment variables
      ENV HCCL_WHITELIST_DISABLE=1
      ENV PYTHONPATH=/tmp/dllogger-master
      ENV http_proxy ""
      ENV https_proxy ""
      # Trigger postbuild.sh.
      RUN bash -c "test -f $POSTBUILD_SH && bash $POSTBUILD_SH || true" && \
          rm $POSTBUILD_SH