用户可通过以下两种方式准备镜像,获取镜像后依次为安装的相应组件创建节点标签、创建用户、创建日志目录和创建命名空间。
组件 |
镜像名称 |
镜像tag |
拉取镜像的节点 |
---|---|---|---|
MindCluster Resilience Controller |
v6.0.RC2 |
管理节点 |
|
MindCluster Volcano |
根据需要选择镜像: v1.4.0-v6.0.RC2 v1.7.0-v6.0.RC2 |
||
MindCluster HCCL Controller |
v6.0.RC2 |
||
MindCluster Ascend Operator |
v6.0.RC2 |
||
MindCluster ClusterD |
v6.0.RC2 |
||
MindCluster NodeD |
v6.0.RC2 |
计算节点 |
|
MindCluster NPU Exporter |
v6.0.RC2 |
||
MindCluster Ascend Device Plugin |
v6.0.RC2 |
若无下载权限,请根据页面提示申请权限。提交申请后等待管理员审核,审核通过后即可下载镜像。
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/resilience-controller:v6.0.RC2 resilience-controller:v6.0.RC2
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-operator:v6.0.RC2 ascend-operator:v6.0.RC2
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/npu-exporter:v6.0.RC2 npu-exporter:v6.0.RC2
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v6.0.RC2 ascend-k8sdeviceplugin:v6.0.RC2
# 使用1.4.0版本的MindCluster Volcano,需要将镜像tag修改为v1.4.0-v6.0.RC2
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-controller-manager:v1.7.0-v6.0.RC2 volcanosh/vc-controller-manager:v1.7.0
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-scheduler:v1.7.0-v6.0.RC2 volcanosh/vc-scheduler:v1.7.0
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:v6.0.RC2 noded:v6.0.RC2
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/hccl-controller:v6.0.RC2 hccl-controller:v6.0.RC2
docker tag swr.cn-south-1.myhuaweicloud.com/ascendhub/clusterd:v6.0.RC2 clusterd:v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/resilience-controller:v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-operator:v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/npu-exporter:v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-k8sdeviceplugin:v6.0.RC2
# 使用1.4.0版本的MindCluster Volcano,需要将镜像tag修改为v1.4.0-v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-controller-manager:v1.7.0-v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/vc-scheduler:v1.7.0-v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/noded:v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/hccl-controller:v6.0.RC2
docker rmi swr.cn-south-1.myhuaweicloud.com/ascendhub/clusterd:v6.0.RC2
root@node:/home/ascend-operator# ll total 41388 drwxr-xr-x 2 root root 4096 Aug 26 20:20 ./ drwxr-xr-x 6 root root 4096 Aug 26 20:20 ../ -r-x------ 1 root root 41992192 Aug 26 02:02 ascend-operator* -r-------- 1 root root 372291 Aug 26 02:02 ascend-operator-v6.0.RC2.yaml -r-------- 1 root root 482 Aug 26 02:02 Dockerfile
ubuntu 18.04 6526a1858e5d 2 years ago 64.2MB
alpine latest a24bb4013296 2 years ago 5.57MB
若上述基础镜像不存在,使用表2中相关命令拉取基础镜像(拉取镜像需要服务器能访问互联网)。
节点产品类型 |
组件名称 |
镜像制作命令 |
说明 |
---|---|---|---|
其他产品 |
MindCluster Ascend Device Plugin |
docker build --no-cache -t ascend-k8sdeviceplugin:{tag} ./ |
{tag}需要参考软件包上的版本。如:软件包上版本为6.0.RC2,则{tag}为v6.0.RC2。 说明:
请确保Dockerfile-310P-1usoc中HwDmUser和HwBaseUser的GID和UID与物理机上的保持一致。 |
Atlas 200I SoC A1 核心板 |
docker build --no-cache -t ascend-k8sdeviceplugin:{tag} -f Dockerfile-310P-1usoc ./ |
||
其他产品 |
MindCluster NPU Exporter |
docker build --no-cache -t npu-exporter:{tag} ./ |
|
Atlas 200I SoC A1 核心板 |
docker build --no-cache -t npu-exporter:{tag} -f Dockerfile-310P-1usoc ./ |
||
其他产品 |
MindCluster HCCL Controller |
docker build --no-cache -t hccl-controller:{tag} ./ |
|
MindCluster Ascend Operator |
docker build --no-cache -t ascend-operator:{tag} ./ |
||
MindCluster Resilience Controller |
docker build --no-cache -t resilience-controller:{tag} ./ |
||
MindCluster NodeD |
docker build --no-cache -t noded:{tag} ./ |
||
MindCluster ClusterD |
docker build --no-cache -t clusterd:{tag} ./ |
||
MindCluster Volcano |
进入MindCluster Volcano组件解压目录,选择以下版本路径并进入。
|
- |
DEPRECATED: The legacy builder is deprecated and will be removed in a future release. Install the buildx component to build images with BuildKit: https://docs.docker.com/go/buildx/ Sending build context to Docker daemon 42.37MB Step 1/5 : FROM ubuntu:18.04 as build ---> 1f37bb13f08a Step 2/5 : RUN useradd -d /home/hwMindX -u 9000 -m -s /usr/sbin/nologin hwMindX && usermod root -s /usr/sbin/nologin ---> Running in d43f1927b1fd Removing intermediate container d43f1927b1fd ---> 9f1d64e06ee6 Step 3/5 : COPY ./ascend-operator /usr/local/bin/ ---> 5022b58c516e Step 4/5 : RUN chown -R hwMindX:hwMindX /usr/local/bin/ascend-operator && chmod 500 /usr/local/bin/ascend-operator && chmod 750 /home/hwMindX && echo 'umask 027' >> /etc/profile && echo 'source /etc/profile' >> /home/hwMindX/.bashrc ---> Running in a781bde3dc56 Removing intermediate container a781bde3dc56 ---> 3d7e2ee7a3bd Step 5/5 : USER hwMindX ---> Running in 338954be8d99 Removing intermediate container 338954be8d99 ---> 103f6a2b43a5 Successfully built 103f6a2b43a5 Successfully tagged ascend-operator:v6.0.RC2
docker save hccl-controller:v6.0.RC2 > hccl-controller-v6.0.RC2-linux-arrch64.tar
scp hccl-controller-v6.0.RC2-linux-arrch64.tar root@{目标节点IP地址}:保存路径
docker load < hccl-controller-v6.0.RC2-linux-arrch64.tar