K8s集群中,如果将包含昇腾AI处理器的节点作为K8s的管理节点,此时该节点既是管理节点又是计算节点,除了需要管理节点对应的标签外,还需要根据节点的昇腾AI处理器类型,打上计算节点的相关标签。生产环境中,管理节点一般为通用服务器,不包含昇腾AI处理器。
操作步骤
- 在任意节点执行以下命令,查询节点名称。
kubectl get node
回显示例如下:
| NAME STATUS ROLES AGE VERSION
ubuntu Ready worker 23h v1.17.3
|
- 按照表1的标签信息,为对应节点打标签,方便集群调度组件在各种不同形态的工作节点之间进行调度。为节点打标签的命令参考如下。
kubectl label nodes 主机名称 标签
以主机名称
“ubuntu”,标签
“masterselector=dls-master-node”为例,命令参考如下。
kubectl label nodes ubuntu masterselector=dls-master-node
- 表1中各节点标签的详细说明请参见K8s原生对象说明章节。
- 请按表1,根据节点类型和产品类型,配置所列出的所有标签。
- 芯片型号的数值可通过npu-smi info命令查询,返回的“Name”字段对应信息为芯片型号,下文的{xxx}即取“910”字符作为芯片型号数值。
表1 节点对应的标签信息节点类型
|
产品类型
|
标签
|
管理节点
|
-
|
masterselector=dls-master-node
|
计算节点
|
Atlas 800 训练服务器(NPU满配)
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend910
- accelerator-type=module
- (可选)nodeDEnable=on
|
计算节点
|
Atlas 800 训练服务器(NPU半配)
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend910
- accelerator-type=half
- (可选)nodeDEnable=on
|
计算节点
|
Atlas 800T A2 训练服务器或Atlas 900 A2 PoD 集群基础单元
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm
- accelerator=huawei-Ascend910
- accelerator-type=module-{xxx}b-8
- (可选)nodeDEnable=on
|
计算节点
|
Atlas 900 A3 SuperPoD 超节点
Atlas 9000 A3 SuperPoD 集群算力系统
Atlas 800T A3 超节点服务器
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend910
- (可选)nodeDEnable=on
|
计算节点
|
A200T A3 Box8 超节点服务器
Atlas 800I A3 超节点服务器
|
node-role.kubernetes.io/worker=worker
workerselector=dls-worker-node
host-arch=huawei-x86或host-arch=huawei-arm
accelerator=huawei-Ascend910
accelerator-type=module-a3-16
(可选)nodeDEnable=on
|
计算节点
|
Atlas 800I A2 推理服务器
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm
- accelerator=huawei-Ascend910
- accelerator-type=module-{xxx}b-8
- server-usage=infer
- (可选)nodeDEnable=on
|
计算节点
|
A200I A2 Box 异构组件
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-x86
- accelerator=huawei-Ascend910
- accelerator-type=module-{xxx}b-8
- server-usage=infer
- (可选)nodeDEnable=on
|
计算节点
|
Atlas 200T A2 Box16 异构子框
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-x86
- accelerator=huawei-Ascend910
- accelerator-type=module-{xxx}b-16
- (可选)nodeDEnable=on
|
计算节点
|
训练服务器(插Atlas 300T 训练卡)
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend910
- accelerator-type=card
- (可选)nodeDEnable=on
|
计算节点
|
推理服务器(插Atlas 300I 推理卡)
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend310
- (可选)nodeDEnable=on
|
计算节点
|
Atlas 推理系列产品(除Atlas 200I SoC A1 核心板)
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend310P
- (可选)nodeDEnable=on
|
计算节点
|
Atlas 200I SoC A1 核心板
|
- node-role.kubernetes.io/worker=worker
- workerselector=dls-worker-node
- host-arch=huawei-arm或host-arch=huawei-x86
- accelerator=huawei-Ascend310P
- servertype=soc
- (可选)nodeDEnable=on
|