昇腾社区首页
中文
注册

MindIE Pod日志采集

文件说明

  • 通过K8s指令或者采集脚本采集MindIE Pod打屏日志,MindIE Pod日志包含实例节点信息,以JSON文件统一存储。
  • 命名约束:${pod_name}.json。
  • 存放路径约束:
    • 采集目录/mindie/log/mindie_cluster_log/
    • ${--mindie_log参数指定路径}/
    • 详细说明请参考日志采集目录结构

使用示例

  1. 点击链接,参考pod_log_collect.sh编写采集脚本。
  2. 确认脚本采集输出路径为“采集目录/mindie/log/mindie_cluster_log/”,可在任意目录执行命令采集。

    输出路径示例:

    log_dir="采集目录/mindie/log/mindie_cluster_log/"

    命令示例:

    bash pod_log_collect.sh

    在输出路径目录生成${pod_name}.json文件。

采集方式说明

故障诊断工具支持通过以下方式采集MindIE Pod打屏日志:

  • 脚本采集。在日志采集脚本中,使用pod_log_collect.sh脚本采集MindIE Pod打屏日志。
  • 命令采集。通过命令采集MindIE Pod打屏日志。

命令采集

  • 在MindIE服务稳定拉起后,执行以下命令,采集MindIE Pod打屏日志。
    kubectl logs -f -n ${namespace} ${podname} | head -n 1000 > ${log_dir}/${podname}.log 2>&1 &

    在${log_dir}目录下查看${podname}.log日志。

    日志内容如下:

    ……
    INFO:root:status of ranktable is not completed, waiting for file update.
    INFO:root:status of ranktable is not completed, waiting for file update.
    INFO:root:status of ranktable is not completed, waiting for file update.
    {"IsMindIEEPJob":true,"status":"completed","server_list":[{"device":[{"device_id":"0","device_ip":"10.0.2.41","super_device_id":"113246208","rank_id":"0"},{"device_id":"1","device_ip":"10.0.3.41","super_device_id":"113311745","rank_id":"1"},{"device_id":"2","device_ip":"10.0.2.42","super_device_id":"113508354","rank_id":"2"},{"device_id":"3","device_ip":"10.0.3.42","super_device_id":"113573891","rank_id":"3"},{"device_id":"4","device_ip":"10.0.2.43","super_device_id":"113770500","rank_id":"4"},{"device_id":"5","device_ip":"10.0.3.43","super_device_id":"113836037","rank_id":"5"},{"device_id":"6","device_ip":"10.0.2.44","super_device_id":"114032646","rank_id":"6"},{"device_id":"7","device_ip":"10.0.3.44","super_device_id":"114098183","rank_id":"7"},{"device_id":"8","device_ip":"10.0.2.45","super_device_id":"114294792","rank_id":"8"},{"device_id":"9","device_ip":"10.0.3.45","super_device_id":"114360329","rank_id":"9"},{"device_id":"10","device_ip":"10.0.2.46","super_device_id":"114556938","rank_id":"10"},{"device_id":"11","device_ip":"10.0.3.46","super_device_id":"114622475","rank_id":"11"},{"device_id":"12","device_ip":"10.0.2.47","super_device_id":"114819084","rank_id":"12"},{"device_id":"13","device_ip":"10.0.3.47","super_device_id":"114884621","rank_id":"13"},{"device_id":"14","device_ip":"10.0.2.48","super_device_id":"115081230","rank_id":"14"},{"device_id":"15","device_ip":"10.0.3.48","super_device_id":"115146767","rank_id":"15"}],"server_id":"141.61.57.128","container_ip":"192.168.247.11"}],"server_count":"1","version":"1.2","super_pod_list":[{"super_pod_id":"1","server_list":[{"server_id":"141.61.57.128"}]}]}
    ……
    • server_list:列表中包含该Pod所在实例的所有节点
    • container_ip:容器IP
    • device_id:卡号

MindIE Pod日志在拉起服务后,会记录实例相关日志,由于日志存在老化机制,若采集的MindIE Pod日志不包含实例相关日志,组件将不支持多实例故障诊断。