Creating a Log Directory

Create the parent directory for component logs, along with individual log directories for each component on its respective node, and configure the owner and permissions accordingly.

Procedure

  1. Run the following commands to create parent directories of component logs on each node based on Log paths of cluster scheduling components.
    mkdir -m 755 /var/log/mindx-dl
    chown root:root /var/log/mindx-dl
  2. Create log directories according to the actual situation of the components.
    Table 1 Log paths of cluster scheduling components

    Component

    Log Directory Creation Command

    Target Node

    Description

    Ascend Device Plugin

    mkdir -m 750 /var/log/mindx-dl/devicePlugin
    chown root:root /var/log/mindx-dl/devicePlugin

    Compute node

    -

    NPU Exporter

    mkdir -m 750 /var/log/mindx-dl/npu-exporter
    chown root:root /var/log/mindx-dl/npu-exporter

    NodeD

    mkdir -m 750 /var/log/mindx-dl/noded
    chown root:root /var/log/mindx-dl/noded

    Elastic Agent

    mkdir -m 750 /var/log/mindx-dl/elastic
    chown User-defined owner /var/log/mindx-dl/elastic
    NOTE:

    Mount the Elastic Agent log directory to the container. For details, see Configuring YAML.

    • The directory owner is user-defined. Note: The owner group of the Elastic Agent installation user, the owner group of the running user for invoking Elastic Agent, and the owner of the directory to which the host is mounted must be the same.
    • You can customize the flush path of run logs of Elastic Agent. In this path, you can view logs of all Elastic Agent nodes without logging in to each node.

    TaskD

    mkdir -m 750 Training script directory/taskd_log
    chown User-defined owner Training script directory/taskd_log
    • The directory owner is user-defined.
    • TaskD can automatically create a log directory during running. The log directory prefix is usually the directory where the bash command is executed or the training is started in the job YAML file.

    Ascend Operator

    mkdir -m 750 /var/log/mindx-dl/ascend-operator
    chown hwMindX:hwMindX /var/log/mindx-dl/ascend-operator

    Management node

    -

    Resilience Controller

    mkdir -m 750 /var/log/mindx-dl/resilience-controller
    chown hwMindX:hwMindX /var/log/mindx-dl/resilience-controller

    ClusterD

    mkdir -m 750 /var/log/mindx-dl/clusterd
    chown hwMindX:hwMindX /var/log/mindx-dl/clusterd

    Volcano

    mkdir -m 750 /var/log/mindx-dl/volcano-controller
    chown hwMindX:hwMindX /var/log/mindx-dl/volcano-controller
    mkdir -m 750 /var/log/mindx-dl/volcano-scheduler
    chown hwMindX:hwMindX /var/log/mindx-dl/volcano-scheduler

    Container Manager

    mkdir -m 750 /var/log/mindx-dl/container-manager
    chown root:root /var/log/mindx-dl/container-manager

    Node where the container recovery feature is required

    -