Creating a Post-training Image for Reinforcement Learning (verl)

verl is a flexible, efficient, and production-ready reinforcement learning (RL) training framework designed for the post-training phase of large language models (LLMs). This section describes how to create a post-training image running Ubuntu 20.04 by using verl.

Obtaining Software Packages

Obtain the software packages of the corresponding OS and the Dockerfile and script files required for packaging the image by referring to Table 1.

Table 1 Required software packages

Software Package

Mandatory (Yes/No)

Description

How to Obtain

Kernels

Yes

CANN binary operator package. The value of arch can be aarch64 or x86_64. The following example uses 8.2.RC1.

Link

NOTE:

Obtain a software package that matches the server model.

CANN

Yes

CANN development kit, which is used to install ToolKit and NNAL. The following example uses 8.2.RC1.

Link

NOTE:

Obtain a software package that matches the server model.

get-pip.py

Yes

Required for installing the pip module.

curl -k https://bootstrap.pypa.io/get-pip.py -o get-pip.py

version.info

Yes

Driver version information file.

Copy the /usr/local/Ascend/driver/version.info file from the host.

ascend_install.info

Yes

Driver installation information file.

Copy the /etc/ascend_install.info file from the host.

vLLM

Yes

Inference engine ( v0.9.1 branch) used in example.

git clone -b v0.9.1 https://github.com/vllm-project/vllm.git

After the package is downloaded, change the torch version in vllm/requirements/build.txt to 2.5.1.

vllm-ascend

Yes

Adaptation plugin of vLLM on NPUs. Use commitid: 4014ad2a46e01c79fd8d98d6283404d0bc414dce.

git clone -b v0.9.1-dev https://github.com/vllm-project/vllm-ascend.git

cd vllm-ascend

git checkout 4014ad2a46e01c79fd8d98d6283404d0bc414dce

Then, change the torch-npu version in requirements.txt to 2.5.1.post1.

Megatron-LM

Yes

Megatron v0.12.1 is used as the training backend.

git clone https://github.com/NVIDIA/Megatron-LM.git

cd Megatron-LM

git checkout core_v0.12.1

MindSpeed

Yes

MindSpeed is used as the training backend. Use commitid: 1f13e6fdbfd701ea7e045c8d6bb2469fab9775a7.

git clone https://gitcode.com/Ascend/MindSpeed.git

cd MindSpeed

git checkout 1f13e6fdbfd701ea7e045c8d6bb2469fab9775a7

verl

Yes

Post-training framework. Use commitid: 02f4386ae89c9a25863dca0bb8b6e119b2f01385.

git clone https://github.com/volcengine/verl.git

cd verl

git checkout 02f4386ae89c9a25863dca0bb8b6e119b2f01385

rl-plugin

Yes

Adaptation plugin of verl on NPUs. Use commitid: 9a679fc3be95d162b78d42e9e3df569c30a89a5e.

git clone https://gitcode.com/Ascend/MindSpeed-RL.git

cd MindSpeed-RL/rl-plugin

git checkout 9a679fc3be95d162b78d42e9e3df569c30a89a5e

Dockerfile

Yes

Required for creating an image.

-

To avoid using a software package that has been tampered with during transmission or storage, download its digital signature file for integrity check while downloading the software package.

After the software package is downloaded from the Support website, verify its PGP digital signature by referring to the OpenPGP Signature Verification Guide. If the software package fails the verification, do not use the software package, and contact Huawei technical support.

The verification is also required before the installation or update of the software package.

For carriers, visit https://support.huawei.com/carrier/digitalSignatureAction.

For enterprise customers: https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054.

  • This following uses two Atlas 800T A2 training servers running on Ubuntu 20.04 with Python 3.10 and CANN 8.2.RC1 as an example to describe how to create a training image. Modify the steps as required.
  • For details about the procedure and version mapping, see related documents.

Procedure

  1. Prepare the required software packages on the host by referring to Table 1.
  2. Write Dockerfile as follows.
    FROM ubuntu:20.04 
    WORKDIR /root 
    COPY . . 
      
    ARG HOST_ASCEND_BASE=/usr/local/Ascend 
    
    ARG TOOLKIT_PATH=/usr/local/Ascend/toolkit/latest 
    ARG TOOLKIT=Ascend-cann-toolkit_8.2.RC1_linux-aarch64.run
    ARG NNAL=Ascend-cann-nnal_8.2.RC1_linux-aarch64.run
    ARG KERNEL=Atlas-A3-cann-kernels_8.2.RC1_linux-aarch64.run 
     
    RUN echo "nameserver 114.114.114.114" > /etc/resolv.conf 
      
    RUN echo "deb http://repo.huaweicloud.com/ubuntu-ports/ focal main restricted universe multiverse\n\ 
    deb http://repo.huaweicloud.com/ubuntu-ports/ focal-updates main restricted universe multiverse\n\ 
    deb http://repo.huaweicloud.com/ubuntu-ports/ focal-backports main restricted universe multiverse\n\ 
    deb http://ports.ubuntu.com/ubuntu-ports/ focal-security main restricted universe multiverse" > /etc/apt/sources.list 
     
    RUN umask 0022 && apt update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends software-properties-common
    RUN umask 0022 && add-apt-repository ppa:deadsnakes/ppa && apt update && apt autoremove -y python python3 && apt install -y python3.10 python3.10-dev vim patch gcc g++ make cmake build-essential libbz2-dev libreadline-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev liblzma-dev m4 dos2unix libopenblas-dev git libjemalloc2 libomp-dev net-tools
     
     
    # Create Python soft links.
    RUN ln -s /usr/bin/python3.10 /usr/bin/python
    RUN unlink /usr/bin/python3
    RUN ln -s /usr/bin/python3.10 /usr/bin/python3
    RUN ln -s /usr/bin/python3.10-config /usr/bin/python-config
    RUN ln -s /usr/bin/python3.10-config /usr/bin/python3-config
      
    RUN umask 0022 && python get-pip.py
    
    # Configure the pip mirror.
    RUN mkdir -p ~/.pip \ 
    && echo '[global] \n\ 
    index-url=https://mirrors.huaweicloud.com/repository/pypi/simple\n\ 
    trusted-host=mirrors.huaweicloud.com' >> ~/.pip/pip.conf 
     
    # Time zone
    RUN ln -sf /usr/share/zoneinfo/UTC /etc/localtime 
      
     
    # Create the HwHiAiUser user and owner. Ensure that the UID and GID are the same as those on the physical machine to avoid ownerless files. In the example, the user and corresponding group are automatically created, and the UID and GID are both 1000.
    RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser 
      
    # Ascend package
    # Copy the /usr/local/Ascend/driver/version.info file on the host to the current directory before the build.
    RUN umask 0022 &&  \ 
        cp ascend_install.info /etc/ && \ 
        mkdir -p /usr/local/Ascend/driver/ && \ 
        cp version.info /usr/local/Ascend/driver/ && \ 
        chmod +x $TOOLKIT && \ 
        chmod +x $KERNEL && \
        chmod +x $NNAL
      
    RUN umask 0022 && ./$TOOLKIT --install-path=/usr/local/Ascend/ --install --quiet 
    RUN umask 0022 && . /usr/local/Ascend/ascend-toolkit/set_env.sh && ./$KERNEL --install --quiet 
    RUN umask 0022 && . /usr/local/Ascend/ascend-toolkit/set_env.sh && ./$NNAL --install --quiet 
     
  3. Build the image. Note that the period (.) at the end of the command must not be omitted.
    docker build -t verl-train:v1 .
  4. Install the inference service package and start the container.
    docker run -it \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    verl-train:v1 /bin/bash

    Run the following commands in the container:

    source /usr/local/Ascend/driver/bin/setenv.bash;
    source /usr/local/Ascend/ascend-toolkit/set_env.sh;
    source /usr/local/Ascend/nnal/atb/set_env.sh;
    source /usr/local/Ascend/nnal/asdsip/set_env.sh;
    # Install vLLM.
    cd vllm && pip install -r requirements/build.txt -i https://mirrors.aliyun.com/pypi/simple/ && pip install -r requirements/common.txt -i https://mirrors.aliyun.com/pypi/simple/ && VLLM_TARGET_DEVICE=empty python setup.py develop && cd ..
    # Install vllm-ascend.
    cd vllm-ascend && pip install -v -e . && cd ..
    # Install Megatron.
    cd Megatron-LM && git checkout core_v0.12.1 && pip install -e .  && cd ..
      
    # Install MindSpeed.
    cd MindSpeed && pip install -e . && cd ..
      
    # Install verl.
    cd verl && pip install -e . && cd ..
      
    # Install the verl plugin.
    cd MindSpeed-RL/rl-plugin && pip install -v -e . && cd ..
    • If an error message is displayed indicating that the CMake path of torch cannot be found during the installation of vllm-ascend, run the following command to specify CMAKE_PREFIX_PATH for installation:
      CMAKE_PREFIX_PATH=/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/ pip install -v -e .
    • If an error message is displayed indicating that the README.md file cannot be found during the installation of verl, create a README.md file in MindSpeed-RL/rl-plugin. The content may be arbitrary.
    • After the installation is complete, if it is found that the torch version is not 2.5.1 and the torchvision version is not 0.20.1, reinstall torch 2.5.1 and torchvision 0.20.1.
  5. Run the following commands in a new window to save the image. To make Dockerfile more secure, you can define HEALTHCHECK based on service requirements. Then, run the HEALTHCHECK [OPTIONS] CMD command in the container to check the container running status.
    # Search for the container ID.
    docker ps | grep verl-train
    # Commit the container as the image. Replace <container_id> with the actual container ID.
    docker commit <container_id> verl-train:v1