Creating a Post-training Image for Reinforcement Learning (verl)

verl is a flexible, efficient, and production-ready reinforcement learning (RL) training framework designed for the post-training phase of large language models (LLMs). This section describes how to create a post-training image running Ubuntu 20.04 by using verl.

Obtaining Software Packages

Obtain the software packages of the corresponding OS and the Dockerfile and script files required for packaging the image by referring to Table 1.

**Table 1** Required software packages
Software Package	Mandatory (Yes/No)	Description	How to Obtain
Kernels	Yes	CANN binary operator package. The value of arch can be aarch64 or x86_64. The following example uses 8.2.RC1.	Link NOTE: Obtain a software package that matches the server model.
CANN	Yes	CANN development kit, which is used to install ToolKit and NNAL. The following example uses 8.2.RC1.	Link NOTE: Obtain a software package that matches the server model.
get-pip.py	Yes	Required for installing the pip module.	curl -k https://bootstrap.pypa.io/get-pip.py -o get-pip.py
version.info	Yes	Driver version information file.	Copy the /usr/local/Ascend/driver/version.info file from the host.
ascend_install.info	Yes	Driver installation information file.	Copy the /etc/ascend_install.info file from the host.
vLLM	Yes	Inference engine ( v0.9.1 branch) used in example.	git clone -b v0.9.1 https://github.com/vllm-project/vllm.git After the package is downloaded, change the torch version in vllm/requirements/build.txt to 2.5.1.
vllm-ascend	Yes	Adaptation plugin of vLLM on NPUs. Use commitid: 4014ad2a46e01c79fd8d98d6283404d0bc414dce.	git clone -b v0.9.1-dev https://github.com/vllm-project/vllm-ascend.git cd vllm-ascend git checkout 4014ad2a46e01c79fd8d98d6283404d0bc414dce Then, change the torch-npu version in requirements.txt to 2.5.1.post1.
Megatron-LM	Yes	Megatron v0.12.1 is used as the training backend.	git clone https://github.com/NVIDIA/Megatron-LM.git cd Megatron-LM git checkout core_v0.12.1
MindSpeed	Yes	MindSpeed is used as the training backend. Use commitid: 1f13e6fdbfd701ea7e045c8d6bb2469fab9775a7.	git clone https://gitcode.com/Ascend/MindSpeed.git cd MindSpeed git checkout 1f13e6fdbfd701ea7e045c8d6bb2469fab9775a7
verl	Yes	Post-training framework. Use commitid: 02f4386ae89c9a25863dca0bb8b6e119b2f01385.	git clone https://github.com/volcengine/verl.git cd verl git checkout 02f4386ae89c9a25863dca0bb8b6e119b2f01385
rl-plugin	Yes	Adaptation plugin of verl on NPUs. Use commitid: 9a679fc3be95d162b78d42e9e3df569c30a89a5e.	git clone https://gitcode.com/Ascend/MindSpeed-RL.git cd MindSpeed-RL/rl-plugin git checkout 9a679fc3be95d162b78d42e9e3df569c30a89a5e
Dockerfile	Yes	Required for creating an image.	-

To avoid using a software package that has been tampered with during transmission or storage, download its digital signature file for integrity check while downloading the software package.

After the software package is downloaded from the Support website, verify its PGP digital signature by referring to the OpenPGP Signature Verification Guide. If the software package fails the verification, do not use the software package, and contact Huawei technical support.

The verification is also required before the installation or update of the software package.

For carriers, visit https://support.huawei.com/carrier/digitalSignatureAction.

For enterprise customers: https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054.

This following uses two Atlas 800T A2 training servers running on Ubuntu 20.04 with Python 3.10 and CANN 8.2.RC1 as an example to describe how to create a training image. Modify the steps as required.
For details about the procedure and version mapping, see related documents.

Procedure

Prepare the required software packages on the host by referring to Table 1.

Write Dockerfile as follows.

FROM ubuntu:20.04 
WORKDIR /root 
COPY . . 
  
ARG HOST_ASCEND_BASE=/usr/local/Ascend 

ARG TOOLKIT_PATH=/usr/local/Ascend/toolkit/latest 
ARG TOOLKIT=Ascend-cann-toolkit_8.2.RC1_linux-aarch64.run
ARG NNAL=Ascend-cann-nnal_8.2.RC1_linux-aarch64.run
ARG KERNEL=Atlas-A3-cann-kernels_8.2.RC1_linux-aarch64.run 
 
RUN echo "nameserver 114.114.114.114" > /etc/resolv.conf 
  
RUN echo "deb http://repo.huaweicloud.com/ubuntu-ports/ focal main restricted universe multiverse\n\ 
deb http://repo.huaweicloud.com/ubuntu-ports/ focal-updates main restricted universe multiverse\n\ 
deb http://repo.huaweicloud.com/ubuntu-ports/ focal-backports main restricted universe multiverse\n\ 
deb http://ports.ubuntu.com/ubuntu-ports/ focal-security main restricted universe multiverse" > /etc/apt/sources.list 
 
RUN umask 0022 && apt update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends software-properties-common
RUN umask 0022 && add-apt-repository ppa:deadsnakes/ppa && apt update && apt autoremove -y python python3 && apt install -y python3.10 python3.10-dev vim patch gcc g++ make cmake build-essential libbz2-dev libreadline-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev liblzma-dev m4 dos2unix libopenblas-dev git libjemalloc2 libomp-dev net-tools
 
 
# Create Python soft links.
RUN ln -s /usr/bin/python3.10 /usr/bin/python
RUN unlink /usr/bin/python3
RUN ln -s /usr/bin/python3.10 /usr/bin/python3
RUN ln -s /usr/bin/python3.10-config /usr/bin/python-config
RUN ln -s /usr/bin/python3.10-config /usr/bin/python3-config
  
RUN umask 0022 && python get-pip.py

# Configure the pip mirror.
RUN mkdir -p ~/.pip \ 
&& echo '[global] \n\ 
index-url=https://mirrors.huaweicloud.com/repository/pypi/simple\n\ 
trusted-host=mirrors.huaweicloud.com' >> ~/.pip/pip.conf 
 
# Time zone
RUN ln -sf /usr/share/zoneinfo/UTC /etc/localtime 
  
 
# Create the HwHiAiUser user and owner. Ensure that the UID and GID are the same as those on the physical machine to avoid ownerless files. In the example, the user and corresponding group are automatically created, and the UID and GID are both 1000.
RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser 
  
# Ascend package
# Copy the /usr/local/Ascend/driver/version.info file on the host to the current directory before the build.
RUN umask 0022 &&  \ 
    cp ascend_install.info /etc/ && \ 
    mkdir -p /usr/local/Ascend/driver/ && \ 
    cp version.info /usr/local/Ascend/driver/ && \ 
    chmod +x $TOOLKIT && \ 
    chmod +x $KERNEL && \
    chmod +x $NNAL
  
RUN umask 0022 && ./$TOOLKIT --install-path=/usr/local/Ascend/ --install --quiet 
RUN umask 0022 && . /usr/local/Ascend/ascend-toolkit/set_env.sh && ./$KERNEL --install --quiet 
RUN umask 0022 && . /usr/local/Ascend/ascend-toolkit/set_env.sh && ./$NNAL --install --quiet

Build the image. Note that the period (.) at the end of the command must not be omitted.
```
docker build -t verl-train:v1 .
```

Install the inference service package and start the container.

docker run -it \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
verl-train:v1 /bin/bash

Run the following commands in the container:

source /usr/local/Ascend/driver/bin/setenv.bash;
source /usr/local/Ascend/ascend-toolkit/set_env.sh;
source /usr/local/Ascend/nnal/atb/set_env.sh;
source /usr/local/Ascend/nnal/asdsip/set_env.sh;
# Install vLLM.
cd vllm && pip install -r requirements/build.txt -i https://mirrors.aliyun.com/pypi/simple/ && pip install -r requirements/common.txt -i https://mirrors.aliyun.com/pypi/simple/ && VLLM_TARGET_DEVICE=empty python setup.py develop && cd ..
# Install vllm-ascend.
cd vllm-ascend && pip install -v -e . && cd ..
# Install Megatron.
cd Megatron-LM && git checkout core_v0.12.1 && pip install -e .  && cd ..
  
# Install MindSpeed.
cd MindSpeed && pip install -e . && cd ..
  
# Install verl.
cd verl && pip install -e . && cd ..
  
# Install the verl plugin.
cd MindSpeed-RL/rl-plugin && pip install -v -e . && cd ..

If an error message is displayed indicating that the CMake path of torch cannot be found during the installation of vllm-ascend, run the following command to specify CMAKE_PREFIX_PATH for installation:
```
CMAKE_PREFIX_PATH=/usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/ pip install -v -e .
```
If an error message is displayed indicating that the README.md file cannot be found during the installation of verl, create a README.md file in MindSpeed-RL/rl-plugin. The content may be arbitrary.
After the installation is complete, if it is found that the torch version is not 2.5.1 and the torchvision version is not 0.20.1, reinstall torch 2.5.1 and torchvision 0.20.1.

Run the following commands in a new window to save the image. To make Dockerfile more secure, you can define HEALTHCHECK based on service requirements. Then, run the HEALTHCHECK [OPTIONS] CMD command in the container to check the container running status.
```
# Search for the container ID.
docker ps | grep verl-train
# Commit the container as the image. Replace <container_id> with the actual container ID.
docker commit <container_id> verl-train:v1
```

Parent topic: Image Creation