本章节旨在指导用户根据已有基础镜像制作Rec SDK的训练镜像。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | # please configure 根据实际情况使用基础镜像 FROM rec_sdk-tf1:6.0.0 WORKDIR /tmp COPY . ./ RUN chmod 777 /tmp # please configure 根据实际情况选择安装需要的依赖,如果一些依赖不需要可以将对应代码去掉或注释 # 设置驱动路径环境变量 ARG ASCEND_BASE=/usr/local/Ascend # CANN相关参数 ARG TOOLKIT_PKG=Ascend-cann-toolkit*.run ARG TFPLUGIN_PKG=Ascend-cann-tfplugin*.run # 删除旧的CANN RUN rm -rf $ASCEND_BASE/ascend-toolkit # 安装ascend-toolkit和tfplugin RUN umask 0022 && \ chmod +x $TOOLKIT_PKG && \ bash $TOOLKIT_PKG --quiet --install --install-path=$ASCEND_BASE && \ source $ASCEND_BASE/ascend-toolkit/set_env.sh && \ chmod +x ./$TFPLUGIN_PKG && \ bash $TFPLUGIN_PKG --quiet --install --install-for-all && \ source $ASCEND_BASE/tfplugin/set_env.sh && \ rm -f ./$TFPLUGIN_PKG && \ rm -rf /root/.cache/pip && \ rm -f $TOOLKIT_PKG # 安装Rec SDK,确认安装tf1或tf2 RUN tar -zxvf Ascend-mindxsdk-mxrec*.tar.gz && \ pip3 install mindxsdk-mxrec/{tf1|tf2}_whl/mx_rec-*.whl --force-reinstall |
docker build -t {镜像名称}:{镜像tag} -f Dockerfile .
依赖名称 |
下载链接 |
---|---|
gcc-7.3.0 |
|
cmake-3.20.6 |
|
ucx |
|
openmpi-4.1.5 |
|
python-3.7.5 |
|
hdf5-1.10.5 |
|
CANN软件包、TensorFlow适配昇腾插件以及Rec SDK软件包 |
参见环境准备 |
Tensorflow(1.15.0/2.6.5) |
|
version.info、ascend_install.info |
在安装CANN时需要用到这两个文件,其中,version.info文件默认路径为:/usr/local/Ascend/driver/version.info;ascend_install.info文件默认路径为/etc/ascend_install.info。 请在对应目录下将这两个文件拷贝到同一个目录下。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | # please configure 根据实际情况使用基础镜像 FROM swr.cn-south-1.myhuaweicloud.com/ascendhub/centos:7.6.1810 WORKDIR /tmp COPY . ./ RUN chmod 777 /tmp # 根据实际情况选择安装需要的依赖,如果一些依赖不需要可以将对应代码去掉或注释;同时,确保下载的依赖的包名与如下代码中的包名一致, # 否则在安装对应的依赖时可能出现找不到文件的错误。 # 1.安装编译环境 RUN yum makecache && \ yum -y install centos-release-scl && \ yum -y install devtoolset-7 && \ yum -y install devtoolset-7-gcc-c++ && \ yum -y install epel-release && \ yum -y install wget zlib-devel bzip2 bzip2-devel openssl-devel ncurses-devel openssh-clients openssh-server sqlite-devel openmpi-devel \ readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel hdf5-devel patch pciutils lcov vim dos2unix gcc-c++ \ autoconf automake libtool git && \ yum clean all && \ rm -rf /var/cache/yum && \ echo "source /opt/rh/devtoolset-7/enable" >> /etc/profile # 注:openssh-server为双机训练样例需要,仅单机训练时可去掉 # 2.安装gcc-7.3.0 RUN source /etc/profile && \ tar -zxvf gcc-7.3.0.tar.gz && \ cd gcc-7.3.0 && \ wget https://mirrors.huaweicloud.com/gnu/gmp/gmp-6.1.0.tar.bz2 && \ wget https://mirrors.huaweicloud.com/gnu/mpfr/mpfr-3.1.4.tar.bz2 && \ wget https://mirrors.huaweicloud.com/gnu/mpc/mpc-1.0.3.tar.gz && \ wget https://mindx.obs.cn-south-1.myhuaweicloud.com/opensource/isl-0.16.1.tar.bz2 && \ sed -i "246s/tar -xf "${ar}"/tar --no-same-owner -xf "${ar}"/" contrib/download_prerequisites && \ ./contrib/download_prerequisites && \ ./configure --enable-languages=c,c++ --disable-multilib --with-system-zlib --prefix=/usr/local/gcc7.3.0 && \ make -j && make -j install && cd .. && \ find gcc-7.3.0/ -name libstdc++.so.6.0.24 -exec cp {} /lib64/ \; && \ rm -rf gcc-7.3.0* ENV LD_LIBRARY_PATH=/usr/local/gcc7.3.0/lib64:$LD_LIBRARY_PATH \ PATH=/usr/local/gcc7.3.0/bin:$PATH # 3.安装cmake RUN source /etc/profile && gcc -v && tar -zxf cmake-3.20.6.tar.gz && \ cd cmake-3.20.6 && \ ./bootstrap && make && make install && cd .. && \ rm -rf cmake-3.20.6* # 4.安装ucx RUN source /etc/profile && gcc -v && unzip master.zip && \ cd ucx-master && \ ./autogen.sh && \ ./contrib/configure-release --prefix=/usr/local/ucx && \ make && make install && cd .. && \ rm -rf ucx-master* master.zip # 5.安装openmpi,需要配置ucx RUN source /etc/profile && gcc -v && tar -zxvf openmpi-4.1.5.tar.gz && \ cd openmpi-4.1.5 && \ ./configure --enable-orterun-prefix-by-default --prefix=/usr/local/openmpi --with-ucx=/usr/local/ucx && \ make -j 16 && make install && cd .. && \ rm -rf openmpi-4.1.5* ENV LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH \ PATH=/usr/local/openmpi/bin:$PATH # 6.安装python3.7.5 RUN source /etc/profile && gcc -v && tar -xvf Python-3.7.5.tar.xz && \ cd Python-3.7.5 && \ mkdir -p build && cd build && \ ../configure --enable-shared --prefix=/usr/local/python3.7.5 && \ make -j && make install && \ cd ../../ && rm -rf Python-3.7.5* && \ ldconfig ENV PATH=$PATH:/usr/local/python3.7.5/bin \ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/python3.7.5/lib # 配置python源 RUN mkdir ~/.pip && touch ~/.pip/pip.conf && \ echo "[global]" > ~/.pip/pip.conf && \ echo "trusted-host=pypi.douban.com" >> ~/.pip/pip.conf && \ echo "index-url=http://pypi.douban.com/simple/" >> ~/.pip/pip.conf && \ echo "timeout=200" >> ~/.pip/pip.conf # 7.安装hdf5 RUN source /etc/profile && gcc -v && tar -zxvf hdf5-1.10.5.tar.gz && \ cd hdf5-1.10.5 && \ ./configure --prefix=/usr/local/hdf5 && \ make && make install && cd .. && rm -rf hdf5-1.10.5* ENV CPATH=/usr/local/hdf5/include/:/usr/local/hdf5/lib/ RUN ln -s /usr/local/hdf5/lib/libhdf5.so /usr/lib/libhdf5.so && \ ln -s /usr/local/hdf5/lib/libhdf5_hl.so /usr/lib/libhdf5_hl.so ENV CC=/usr/lib64/openmpi/bin/mpicc # 8.安装python包 RUN pip3.7 install -U pip && \ pip3.7 install numpy && \ pip3.7 install decorator && \ pip3.7 install sympy==1.4 && \ pip3.7 install cffi==1.12.3 && \ pip3.7 install pyyaml && \ pip3.7 install pathlib2 && \ pip3.7 install grpcio && \ pip3.7 install grpcio-tools && \ pip3.7 install protobuf==3.20.0 && \ pip3.7 install scipy && \ pip3.7 install requests && \ pip3.7 install mpi4py && \ pip3.7 install scikit-learn && \ pip3.7 install easydict && \ pip3.7 install attrs && \ pip3.7 install pytest==7.1.1 && \ pip3.7 install pytest-cov==4.1.0 && \ pip3.7 install pytest-html && \ pip3.7 install Cython && \ pip3.7 install h5py==3.1.0 && \ pip3.7 install pandas && \ rm -rf /root/.cache/pip # 安装mpi4py时使用该环境变量,安装完成后取消 RUN unset CC # 9.设置驱动路径环境变量 ARG ASCEND_BASE=/usr/local/Ascend ENV LD_LIBRARY_PATH=$ASCEND_BASE/driver/lib64:$ASCEND_BASE/driver/lib64/common:$ASCEND_BASE/driver/lib64/driver:$LD_LIBRARY_PATH # 10.CANN相关参数 ARG TOOLKIT_PKG=Ascend-cann-toolkit*.run ARG TOOLKIT_PATH=$ASCEND_BASE/ascend-toolkit/latest # 11.TF相关 ARG TFPLUGIN_PKG=Ascend-cann-tfplugin*.run # MODIFIED TF=1.15.0 or TF=2.6.5,在arm环境下换成对应的whl包 ARG TF_PKG=tensorflow-cpu== # 12.安装ascend-toolkit和tfplugin,及其他python依赖包 RUN umask 0022 && \ mkdir -p $ASCEND_BASE/driver && \ cp version.info $ASCEND_BASE/driver/ && \ cp ascend_install.info /etc/ && \ chmod +x $TOOLKIT_PKG && \ bash $TOOLKIT_PKG --quiet --install --install-path=$ASCEND_BASE && \ source $ASCEND_BASE/ascend-toolkit/set_env.sh && \ chmod +x ./$TFPLUGIN_PKG && \ bash $TFPLUGIN_PKG --quiet --install --install-for-all && \ source $ASCEND_BASE/tfplugin/set_env.sh && \ rm -f ./$TFPLUGIN_PKG && \ pip3.7 install $TF_PKG && \ HOROVOD_WITH_MPI=1 HOROVOD_WITH_TENSORFLOW=1 pip3.7 install horovod --no-cache-dir && \ pip3.7 install tf_slim && \ pip3.7 install funcsigs && \ rm -rf /root/.cache/pip && \ rm -f $TOOLKIT_PKG && \ rm -rf $ASCEND_BASE/driver && \ rm -rf /etc/ascend_install.info # 13.安装Rec SDK,确认安装tf1或tf2 RUN tar -zxvf Ascend-mindxsdk-mxrec*.tar.gz && \ pip3 install mindxsdk-mxrec/{tf1|tf2}_whl/mx_rec-*.whl --force-reinstall # 14.清理临时目录 RUN rm -rf ./* |
docker build -t {镜像名称}:{镜像tag} -f Dockerfile .