Creating a Container Image Using a Dockerfile (PyTorch)

Prerequisites

The software packages of the corresponding OS and the Dockerfile and script files required for packaging images are obtained by referring to Table 1.

In the deep learning engine software package names, {version} indicates the version, and {arch} indicates the architecture.

Table 1 Required software

Software Package

Description

How to Obtain

Ascend-cann-nnae_{version}_linux-{arch}.run

Deep learning engine software package.

Link

apex-0.1+ascend-cp37-cp37m-linux_{arch}.whl

Mixed precision module.

-

torch-1.5.0+ascend.post2-cp37-cp37m-linux_{arch}.whl

PyTorch Adapter plugin.

Dockerfile

Required for creating an image.

Prepared by users.

dllogger-master

PyTorch log tool.

Link

ascend_install.info

Driver installation information file.

Copy the /etc/ascend_install.info file from the host.

version.info

Driver version information file.

Copy the /usr/local/Ascend/driver/version.info file from the host.

prebuild.sh

Script used to prepare for the installation of the training operating environment, for example, configuring the agent.

Prepared by users.

install_ascend_pkgs.sh

Script for installing the Ascend software package.

postbuild.sh

Delete the installation packages, scripts, and proxy configurations that do not need to be retained in the container.

To prevent a software package from being maliciously tampered with during transmission or storage, download the corresponding digital signature file for integrity verification when downloading the software package.

After the software package is downloaded, verify its PGP digital signature according to the OpenPGP Signature Verification Guide. If the software package fails the verification, do not use the software package, and contact Huawei technical support.

Before a software package is used in installation or upgrade, its digital signature also needs to be verified according to OpenPGP Signature Verification Guide to ensure that the software package is not tampered with.

For carrier users, visit https://support.huawei.com/carrier/digitalSignatureAction.

For enterprise users, visit https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054.

This section uses Ubuntu as an example.

Procedure

  1. Upload the software packages, deep learning framework, host driver installation information file, and driver version information file to the same directory (for example, /home/test) on the server.
    • Ascend-cann-nnae_{version}_linux-{arch}.run
    • apex-0.1+ascend-cp37-cp37m-linux_{arch}.whl
    • torch-1.5.0+ascend.post2-cp37-cp37m-linux_{arch}.whl
    • dllogger-master
    • ascend_install.info
    • version.info
  2. Log in to the server as the root user.
  3. Perform the following steps to prepare the prebuild.sh file:
    1. Go to the directory where the software packages are stored and run the following command to create the prebuild.sh file:

      vim prebuild.sh

    2. For details about the content to be written, see prebuild.sh compilation example. After writing the content, run the :wq command to save the content. The following uses Ubuntu as an example.
  4. Perform the following steps to prepare the install_ascend_pkgs.sh file:
    1. Go to the directory where the software packages are stored and run the following command to create the install_ascend_pkgs.sh file:

      vim install_ascend_pkgs.sh

    2. For details about the content to be written, see install_ascend_pkgs.sh compilation example. After writing the content, run the :wq command to save the content. The following uses Ubuntu as an example.
  5. Perform the following steps to prepare the postbuild.sh file:
    1. Go to the directory where the software packages are stored and run the following command to create the postbuild.sh file:

      vim postbuild.sh

    2. For details about the content to be written, see postbuild.sh compilation example. After writing the content, run the :wq command to save the content. The following uses Ubuntu as an example.
  6. Perform the following steps to create the Dockerfile file:
    1. Go to the directory where the software packages are stored and run the following command to create the Dockerfile file:

      vim Dockerfile

    2. For details about the content to be written, see Dockerfile compilation example. After writing the content, run the :wq command to save the content. The following uses Ubuntu as an example.

      To obtain the image ubuntu:18.04, you can also run the docker pull ubuntu:18.04 command to obtain the image from Docker Hub.

  7. Go to the directory where the software packages are stored and run the following command to create a container image. Do not omit the period (.) at the end of the command.

    docker build -t Image name_System architecture:Image tag .

    Example:

    docker build -t test_train_arm64:v1.0 .

    Table 2 describes the commands.

    Table 2 Parameters

    Parameter

    Description

    -t

    Specifies the image name.

    Image name_System architecture:Image tag

    Image name and tag. Change them based on the actual situation.

    If "Successfully built xxx" is displayed, the image has been created.

  8. After the image is created, run the following command to view the image information:

    docker images

    Example:

    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    test_train_arm64    v1.0                d82746acd7f0        27 minutes ago      749MB

Compilation Examples

Modify the software package version and architecture based on the actual situation.

  1. Compilation example of prebuild.sh
    Compilation example of prebuild.sh for the Ubuntu ARM OS
    #!/bin/bash
    #--------------------------------------------------------------------------------
    # Use the bash syntax to compile script code and prepare for the installation, for example, configuring the proxy.
    # This script will be run before the formal creation process is started.
    #
    # Note: After this script is run, it will not be automatically cleared. If it does not need to be retained in the image, clear it from the postbuild.sh script.
    #--------------------------------------------------------------------------------
    # DNS settings
    tee /etc/resolv.conf <<- EOF
    nameserver xxx.xxx.xxx.xxx # IP address of the DNS server. You can enter multiple IP addresses as required.
    nameserver xxx.xxx.xxx.xxx
    nameserver xxx.xxx.xxx.xxx
    EOF
    # APT proxy settings
    tee /etc/apt/apt.conf.d/80proxy <<- EOF
    Acquire::http::Proxy "http://xxx.xxx.xxx.xxx:xxx";  # IP address and port number of the HTTP proxy server.
    Acquire::https::Proxy "http://xxx.xxx.xxx.xxx:xxx";  # IP address and port number of the HTTPS proxy server.
    EOF
    chmod 777 -R /tmp
    rm /var/lib/apt/lists/*
    # APT source settings (The following uses Ubuntu 18.04 ARM as an example. Set the information as required.)
    tee /etc/apt/sources.list <<- EOF
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-security main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-security main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-updates main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-updates main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-proposed main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-proposed main restricted universe multiverse
    deb http://mirrors.aliyun.com/ubuntu-ports/ bionic-backports main restricted universe multiverse
    deb-src http://mirrors.aliyun.com/ubuntu-ports/ bionic-backports main restricted universe multiverse
    EOF

    Compilation example of prebuild.sh for the Ubuntu x86 OS

    #!/bin/bash
    #--------------------------------------------------------------------------------
    
    # Use the bash syntax to compile script code and prepare for the installation, for example, configuring the proxy.
    # This script will be run before the formal creation process is started.
    #
    # Note: After this script is run, it will not be automatically cleared. If it does not need to be retained in the image, clear it from the postbuild.sh script.
    #--------------------------------------------------------------------------------
    # APT proxy settings
    tee /etc/apt/apt.conf.d/80proxy <<- EOF
    Acquire::http::Proxy "http://xxx.xxx.xxx.xxx:xxx";    # IP address and port number of the HTTP proxy server.
    Acquire::https::Proxy "http://xxx.xxx.xxx.xxx:xxx";  # IP address and port number of the HTTPS proxy server.
    EOF
    
    #APT source settings (The following uses Ubuntu 18.04 x86 as an example. Set the information as required.)
    tee /etc/apt/sources.list <<- EOF
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-backports main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-proposed main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-security main multiverse restricted universe
    deb http://mirrors.ustc.edu.cn/ubuntu/ bionic-updates main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-backports main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-proposed main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-security main multiverse restricted universe
    deb-src http://mirrors.ustc.edu.cn/ubuntu/ bionic-updates main multiverse restricted universe
    EOF
  2. Compilation example of install_ascend_pkgs.sh
    #--------------------------------------------------------------------------------
    # Use the bash syntax to compile script code and install the Ascend software package.
    #
    # Note: After this script is run, it will not be automatically cleared. If it does not need to be retained in the image, clear it from the postbuild.sh script.
    #--------------------------------------------------------------------------------
    umask 0022
    cp ascend_install.info /etc/
    # Copy the /usr/local/Ascend/driver/version.info file on the host to the current directory before creating the container image.
    mkdir -p /usr/local/Ascend/driver/
    cp version.info /usr/local/Ascend/driver/
    # Ascend-cann-nnae_{version}_linux-{arch}.run
    chmod +x Ascend-cann-nnae_{version}_linux-{arch}.run
    ./Ascend-cann-nnae_{version}_linux-{arch}.run --install-path=/usr/local/Ascend/ --install --quiet
    # After the NNAE package is installed, clear the following files. During container startup, the NNAE package is mounted by the Ascend Docker.
    rm -f version.info
    rm -rf /usr/local/Ascend/driver/
  3. Compilation example of postbuild.sh
    #--------------------------------------------------------------------------------
    # Use the bash syntax to compile the script code and delete the installation packages, scripts, and proxy configurations that do not need to be retained in the container.
    # This script will be run after the formal creation process ends.
    #
    # Note: After this script is run, it is automatically cleared and will not be left in the image. The scripts and Working Dir are stored in /tmp.
    #--------------------------------------------------------------------------------
    rm -f ascend_install.info
    rm -f prebuild.sh
    rm -f install_ascend_pkgs.sh
    rm -f Dockerfile
    rm -f Ascend-cann-nnae_{version}_linux-{arch}.run
    rm -f apex-0.1+ascend-cp37-cp37m-linux_{arch}.whl
    rm -f torch-1.5.0+ascend.post2-cp37-cp37m-linux_{arch}.whl
    rm -f /etc/apt/apt.conf.d/80proxy
    
  4. Dockerfile compilation sample
    Dockerfile example for Ubuntu ARM
    FROM ubuntu:18.04
    
    ARG PYTORCH_PKG=torch-1.5.0+ascend.post2-cp37-cp37m-linux_aarch64.whl
    ARG APEX_PKG=apex-0.1+ascend-cp37-cp37m-linux_aarch64.whl
    ARG HOST_ASCEND_BASE=/usr/local/Ascend
    ARG NNAE_PATH=/usr/local/Ascend/nnae/latest
    # ARG TF_PLUGIN_PATH=/usr/local/Ascend/tfplugin/latest
    ARG INSTALL_ASCEND_PKGS_SH=install_ascend_pkgs.sh
    ARG PREBUILD_SH=prebuild.sh
    ARG POSTBUILD_SH=postbuild.sh
    
    WORKDIR /tmp
    COPY . ./
    
    # Trigger prebuild.sh.
    RUN bash -c "test -f $PREBUILD_SH && bash $PREBUILD_SH || true"
    
    ENV http_proxy http://xxx.xxx.xxx.xxx:xxx
    ENV https_proxy http://xxx.xxx.xxx.xxx:xxx
    
    # System package
    RUN apt update && \
        apt install --no-install-recommends \
            python3.7 python3.7-dev \
            curl g++ pkg-config unzip \
            libblas3 liblapack3 liblapack-dev \
            libblas-dev gfortran libhdf5-dev \
            libffi-dev libicu60 libxml2 -y
    
    # Create a Python soft link.
    RUN ln -s /usr/bin/python3.7 /usr/bin/python
    
    # Configure the Python pip source.
    RUN mkdir -p ~/.pip \
    && echo '[global] \n\
    index-url=https://pypi.doubanio.com/simple/\n\
    trusted-host=pypi.doubanio.com' >> ~/.pip/pip.conf
    
    # pip3.7
    RUN curl -k https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
        cd /tmp && \
        apt-get download python3-distutils && \
        dpkg-deb -x python3-distutils_*.deb / && \
        rm python3-distutils_*.deb && \
        cd - && \
        python3.7 get-pip.py && \
        rm get-pip.py
    
    # Create the HwHiAiUser user and owner. The values of UID and GID must be the same as those on the physical machine to avoid generating ownerless files. In the example, the user and the corresponding group are automatically created. The values of UID and GID are both 1000.
    RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser
    
    # Python package
    RUN pip3.7 install numpy && \
        pip3.7 install decorator && \
        pip3.7 install sympy==1.4 && \
        pip3.7 install cffi && \
        pip3.7 install pyyaml && \
        pip3.7 install pathlib2 && \
        pip3.7 install grpcio && \
        pip3.7 install grpcio-tools && \
        pip3.7 install protobuf && \
        pip3.7 install scipy && \
        pip3.7 install requests && \
        pip3.7 install attrs && \
        pip3.7 install Pillow==6.2.2 && \
        pip3.7 install torchvision==0.2.2.post3
    
    # Ascend package
    RUN umask 0022  && bash $INSTALL_ASCEND_PKGS_SH
    
    RUN umask 0022 && pip3.7 install $APEX_PKG
    
    RUN umask 0022 && pip3.7 install $PYTORCH_PKG
    
    RUN cd /tmp/dllogger-master/ && \
        python3.7 setup.py build && \
        python3.7 setup.py install
    
    # Environment variables
    ENV HCCL_WHITELIST_DISABLE=1
    ENV PYTHONPATH=/tmp/dllogger-master
    
    # Create /lib64/ld-linux-aarch64.so.1.
    RUN umask 0022 && \
        if [ ! -d "/lib64" ]; \
        then \
            mkdir /lib64 && ln -sf /lib/ld-linux-aarch64.so.1 /lib64/ld-linux-aarch64.so.1; \
        fi
    
    ENV http_proxy ""
    ENV https_proxy ""
    
    # Trigger postbuild.sh.
    RUN bash -c "test -f $POSTBUILD_SH && bash $POSTBUILD_SH || true" && \
        rm $POSTBUILD_SH

    Dockerfile example for Ubuntu x86

    FROM ubuntu:18.04
    
    ARG PYTORCH_PKG=torch-1.5.0+ascend.post2-cp37-cp37m-linux_x86_64.whl
    ARG APEX_PKG=apex-0.1+ascend-cp37-cp37m-linux_x86_64.whl
    ARG HOST_ASCEND_BASE=/usr/local/Ascend
    ARG NNAE_PATH=/usr/local/Ascend/nnae/latest
    ARG INSTALL_ASCEND_PKGS_SH=install_ascend_pkgs.sh
    ARG PREBUILD_SH=prebuild.sh
    ARG POSTBUILD_SH=postbuild.sh
    WORKDIR /tmp
    COPY . ./
    
    # Trigger prebuild.sh.
    RUN bash -c "test -f $PREBUILD_SH && bash $PREBUILD_SH || true"
    
    # System package
    RUN apt update && \
        apt install --no-install-recommends \
            python3.7 python3.7-dev \
            curl g++ pkg-config unzip \
            libblas3 liblapack3 liblapack-dev \
            libblas-dev gfortran libhdf5-dev \
            libffi-dev libicu60 libxml2 -y
    
    ENV http_proxy http://xxx.xxx.xxx.xxx:xxx
    ENV https_proxy http://xxx.xxx.xxx.xxx:xxx
    
    # Create a Python soft link.
    RUN ln -s /usr/bin/python3.7 /usr/bin/python
    
    # Configure the Python pip source.
    RUN mkdir -p ~/.pip \
    && echo '[global] \n\
    index-url=https://pypi.doubanio.com/simple/\n\
    trusted-host=pypi.doubanio.com' >> ~/.pip/pip.conf
    
    # pip3.7
    RUN curl -k https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
        cd /tmp && \
        apt-get download python3-distutils && \
        dpkg-deb -x python3-distutils_*.deb / && \
        rm python3-distutils_*.deb && \
        cd - && \
        python3.7 get-pip.py && \
        rm get-pip.py
    
    # Create the HwHiAiUser user and owner. The values of UID and GID must be the same as those on the physical machine to avoid generating ownerless files. In the example, the user and the corresponding group are automatically created. The values of UID and GID are both 1000.
    RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser
    
    # Python package
    RUN pip3.7 install numpy && \
        pip3.7 install decorator && \
        pip3.7 install sympy==1.4 && \
        pip3.7 install cffi==1.12.3 && \
        pip3.7 install pyyaml && \
        pip3.7 install pathlib2 && \
        pip3.7 install grpcio && \
        pip3.7 install grpcio-tools && \
        pip3.7 install protobuf && \
        pip3.7 install scipy && \
        pip3.7 install requests && \
        pip3.7 install attrs && \
        pip3.7 install Pillow==8.3.2 && \
        pip3.7 install torchvision==0.6.0
    
    # Ascend package
    RUN bash $INSTALL_ASCEND_PKGS_SH
    
    RUN pip3.7 install $APEX_PKG
    
    RUN pip3.7 install $PYTORCH_PKG
    
    RUN cd /tmp/dllogger-master/ && \    # Find the directory where the setup.py file is located based on the downloaded file and modify the file.
        python3.7 setup.py build && \
        python3.7 setup.py install
    
    # Environment variables
    ENV HCCL_WHITELIST_DISABLE=1
    ENV PYTHONPATH=/tmp/dllogger-master
    
    ENV http_proxy ""
    ENV https_proxy ""
    
    # Trigger postbuild.sh.
    RUN bash -c "test -f $POSTBUILD_SH && bash $POSTBUILD_SH || true" && \
        rm $POSTBUILD_SH