Creating a MindFormers Training Image (MindSpore)
The goal of MindSpore Transformers (MindFormers for short) is to build a full-process development suite for foundation model training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based pre-trained models and SOTA downstream task applications in the industry, covering various parallel features. It is expected to help users easily implement foundation model training and innovative R&D.
MindSpore Transformers documentation includes software installation and quick start, which can be used as a reference during image creation.
You can create a training image based on a base training image and MindFormers documentation. For details about how to create the base training image, see Creating a Container Image Using a Dockerfile (MindSpore).
This section describes how to create a training image based on Ubuntu 20.04.
Obtaining Software Packages
Obtain the software packages of the corresponding OS and prepare the Dockerfile and script file required by the image by referring to Table 1. In the software package name, {version} indicates the version number, {arch} indicates the architecture, and {chip_type} indicates the processor type.
Software Package |
Mandatory (Yes/No) |
Description |
How to Obtain |
|---|---|---|---|
MindFormers code repository |
Yes |
Used to build a full-process development suite for foundation model training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based pre-trained models and SOTA downstream task applications in the industry, covering various parallel features. |
git clone https://gitee.com/mindspore/mindformers.git cd mindformers git checkout f06a946af29c8c7e002a6c49458f513d47b642e5 |
requirements.txt |
No |
When MindSpore is installed using pip, an error may be reported during dependency installation. In this case, you can install dependencies first. |
wget https://gitee.com/mindspore/mindspore/raw/r2.4.1/requirements.txt |
mindspore-{version}-cp3x-cp3x-linux_aarch64.whl |
Yes |
MindSpore .whl package |
|
mindio_ttp-{version}-py3-none-linux_{arch}.whl |
Yes |
MindIO TFT installation package |
|
Ascend-cann-{chip_type}-ops_{version}_linux-{arch}.run |
Yes For versions earlier than CANN 8.5.0, the package name is Ascend-cann-kernels-{chip_type}_{version}_linux-{arch}.run. |
CANN operator package. |
NOTE:
Obtain a software package that matches the server model. |
Ascend-cann-toolkit_{version}_linux-{arch}.run |
Yes |
CANN ToolKit package |
NOTE:
Obtain a software package that matches the server model. |
taskd-{version}-py3-none-linux_{arch}.whl |
Yes |
.whl package of the resumable training component. |
NOTE:
|
version.info |
Yes CANN installation dependency |
Driver version information file. |
Copy the /usr/local/Ascend/driver/version.info file from the host. |
ascend_install.info |
Yes CANN installation dependency |
Driver installation information file. |
Copy the /etc/ascend_install.info file from the host. |
get-pip.py |
Yes |
Required for installing the pip module. |
curl -k https://bootstrap.pypa.io/get-pip.py -o get-pip.py |
Dockerfile |
Yes |
Required for creating an image. |
- |
To avoid using a software package that has been tampered with during transmission or storage, download its digital signature file for integrity check while downloading the software package.
After the software package is downloaded from the Support website, verify its PGP digital signature by referring to the OpenPGP Signature Verification Guide. If the verification fails, do not use the software package, and contact Huawei technical support.
Before using software for installation or upgrade, verify the digital signature to ensure that the software has not been tampered with.
For carriers, visit https://support.huawei.com/carrier/digitalSignatureAction.
For enterprises, visit https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054.
This section uses a single Atlas 800T A2 training server running on Ubuntu 20.04 with Python 3.10 as an example to describe how to create an image. Modify the related steps as required.
Procedure
- Prepare software packages on the host.
- Create a Dockerfile as follows.
FROM ubuntu:20.04 WORKDIR /root COPY . . ARG HOST_ASCEND_BASE=/usr/local/Ascend ARG TOOLKIT_PATH=/usr/local/Ascend/cann # The following uses CANN 8.5.0, as an example. Modify the following information based on the actual situation. ARG TOOLKIT=Ascend-cann-toolkit_8.5.0_linux-aarch64.run ARG OPS=Ascend-cann-910b-ops_8.5.0_linux-aarch64.run ARG MINDIO_TTP_WHL=mindio_ttp-1.0.0-py3-none-linux_aarch64.whl ARG MINDFORMERS=mindformers ARG MINDSPORE_REQUIREMENTS=requirements.txt ARG MINDSPORE_WHL=mindspore-2.5.0-cp310-cp310-linux_aarch64.whl ARG TASKD_WHL=taskd-7.0.RC1-py3-none-linux_aarch64.whl RUN echo "nameserver 114.114.114.114" > /etc/resolv.conf RUN echo "deb http://repo.huaweicloud.com/ubuntu-ports/ focal main restricted universe multiverse\n\ deb http://repo.huaweicloud.com/ubuntu-ports/ focal-updates main restricted universe multiverse\n\ deb http://repo.huaweicloud.com/ubuntu-ports/ focal-backports main restricted universe multiverse\n\ deb http://ports.ubuntu.com/ubuntu-ports/ focal-security main restricted universe multiverse" > /etc/apt/sources.list ARG DEBIAN_FRONTEND=noninteractive RUN umask 0022 && apt update && \ apt-get install -y --no-install-recommends \ software-properties-common RUN umask 0022 && add-apt-repository ppa:deadsnakes/ppa && \ apt update && \ apt autoremove -y python python3 && \ apt install -y python3.10 python3.10-dev # Create a Python soft link. RUN ln -s /usr/bin/python3.10 /usr/bin/python RUN ln -s /usr/bin/python3.10 /usr/bin/python3 RUN ln -s /usr/bin/python3.10-config /usr/bin/python-config RUN ln -s /usr/bin/python3.10-config /usr/bin/python3-config # System packages RUN umask 0022 && apt update && \ apt-get install -y --no-install-recommends \ gcc g++ make cmake vim \ zlib1g zlib1g-dev \ openssl libsqlite3-dev libssl-dev \ libffi-dev unzip pciutils \ net-tools libblas-dev \ gfortran libblas3 libopenblas-dev \ curl unzip liblapack3 liblapack-dev \ libhdf5-dev libxml2 patch # Time zone # RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime RUN ln -sf /usr/share/zoneinfo/UTC /etc/localtime # Configure the pip mirror. RUN mkdir -p ~/.pip \ && echo '[global] \n\ index-url=https://mirrors.huaweicloud.com/repository/pypi/simple\n\ trusted-host=mirrors.huaweicloud.com' >> ~/.pip/pip.conf # pip3.10 RUN cd /tmp && \ apt-get download python3-distutils && \ dpkg-deb -x python3-distutils_*.deb / && \ rm python3-distutils_*.deb && \ cd - && \ python get-pip.py && \ rm get-pip.py RUN umask 0022 && \ pip install sympy==1.4 && \ pip install cffi && \ pip install pathlib2 && \ pip install grpcio && \ pip install grpcio-tools && \ pip install absl-py && \ pip install datasets && \ pip install tokenizers==0.20.1 && \ pip install pyOpenSSL # Create the HwHiAiUser user and owner. The values of UIDs and GIDs must be the same as those on the physical machine to avoid generating ownerless files. In the example, the user and the corresponding group are automatically created. The values of UIDs and GIDs are both 1000. RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /bin/bash HwHiAiUser # Ascend packages # Copy the /usr/local/Ascend/driver/version.info file on the host to the current directory first. RUN umask 0022 && \ cp ascend_install.info /etc/ && \ mkdir -p /usr/local/Ascend/driver/ && \ cp version.info /usr/local/Ascend/driver/ && \ chmod +x $TOOLKIT && \ chmod +x $OPS RUN umask 0022 && ./$TOOLKIT --install-path=/usr/local/Ascend/ --install --quiet RUN echo "source /usr/local/Ascend/cann/set_env.sh" >> ~/.bashrc RUN umask 0022 && ./$OPS --install --quiet # After the toolkit package is installed, clear the following files. During container startup, the toolkit package is mounted by Ascend Docker. RUN rm -f version.info && \ rm -rf /usr/local/Ascend/driver/ # Install MindSpore. RUN umask 0022 && pip uninstall te topi hccl -y && \ pip install sympy && \ pip install /usr/local/Ascend/cann/lib64/hccl-*-py3-none-any.whl RUN umask 0022 && \ pip install -r $MINDSPORE_REQUIREMENTS && \ pip install $MINDSPORE_WHL # Install MindFormers. RUN umask 0022 && cd $MINDFORMERS && \ pip install -r requirements.txt # Adaptation script for MindCluster resumable training without loss RUN umask 0022 && \ pip install $MINDIO_TTP_WHL --target=$(pip show mindspore | awk '/Location:/ {print $2}') && \ pip install $TASKD_WHL # Environment variable ENV HCCL_WHITELIST_DISABLE=1 # Create /lib64/ld-linux-aarch64.so.1. RUN umask 0022 && \ if [ ! -d "/lib64" ]; \ then \ mkdir /lib64 && ln -sf /lib/ld-linux-aarch64.so.1 /lib64/ld-linux-aarch64.so.1; \ fi # Install the job scheduling dependency library. RUN pip install apscheduler RUN rm -rf tmp && \ rm -f $TOOLKIT && \ rm -f $OPS && \ rm -f $MINDIO_TTP_WHL && \ rm -f $MINDSPORE_REQUIREMENTS && \ rm -f $MINDSPORE_WHL ## Pack the preceding content into the image mindformers-dl:v1. - Build the image. Run the following command to generate the image. To make the Dockerfile more secure, you can HEALTHCHECK in the Dockerfile based on service requirements. Run the HEALTHCHECK [OPTIONS] CMD command in the container to check the running status of the container. Do not omit the period (.) at the end of the command.
docker build -t mindformers-dl:v1 .