故障处理
组件Pod状态不为Running
集群调度组件Pod处于ContainerCreating状态
用户UID或GID被占用
启动集群调度组件失败,日志打印“get sem errno =13”
集群调度组件连接K8s异常
组件启动yaml执行成功,找不到组件对应的Pod
日志出现connecting to container runtime failed
手动安装MindCluster Volcano后,Pod状态为:CrashLoopBackOff
MindCluster Volcano组件工作异常,日志出现Failed to get plugin
MindCluster HCCL Controller日志打印Failed to watch *v1alpha1.Job
NPU-Exporter检查动态路径失败,日志出现check uid or mode failed