Creating a Log Directory
Create the parent directory for component logs, along with individual log directories for each component on its respective node, and configure the owner and permissions accordingly.
Procedure
- Run the following commands to create parent directories of component logs on each node based on Log paths of cluster scheduling components.
mkdir -m 755 /var/log/mindx-dl chown root:root /var/log/mindx-dl
- Create log directories according to the actual situation of the components.
Table 1 Log paths of cluster scheduling components Component
Log Directory Creation Command
Target Node
Description
Ascend Device Plugin
mkdir -m 750 /var/log/mindx-dl/devicePlugin chown root:root /var/log/mindx-dl/devicePlugin
Compute node
-
NPU Exporter
mkdir -m 750 /var/log/mindx-dl/npu-exporter chown root:root /var/log/mindx-dl/npu-exporter
NodeD
mkdir -m 750 /var/log/mindx-dl/noded chown root:root /var/log/mindx-dl/noded
Elastic Agent
mkdir -m 750 /var/log/mindx-dl/elastic chown User-defined owner /var/log/mindx-dl/elastic
NOTE:Mount the Elastic Agent log directory to the container. For details, see Configuring YAML.
- The directory owner is user-defined. Note: The owner group of the Elastic Agent installation user, the owner group of the running user for invoking Elastic Agent, and the owner of the directory to which the host is mounted must be the same.
- You can customize the flush path of run logs of Elastic Agent. In this path, you can view logs of all Elastic Agent nodes without logging in to each node.
TaskD
mkdir -m 750 Training script directory/taskd_log chown User-defined owner Training script directory/taskd_log
- The directory owner is user-defined.
- TaskD can automatically create a log directory during running. The log directory prefix is usually the directory where the bash command is executed or the training is started in the job YAML file.
Ascend Operator
mkdir -m 750 /var/log/mindx-dl/ascend-operator chown hwMindX:hwMindX /var/log/mindx-dl/ascend-operator
Management node
-
Resilience Controller
mkdir -m 750 /var/log/mindx-dl/resilience-controller chown hwMindX:hwMindX /var/log/mindx-dl/resilience-controller
ClusterD
mkdir -m 750 /var/log/mindx-dl/clusterd chown hwMindX:hwMindX /var/log/mindx-dl/clusterd
Volcano
mkdir -m 750 /var/log/mindx-dl/volcano-controller chown hwMindX:hwMindX /var/log/mindx-dl/volcano-controller
mkdir -m 750 /var/log/mindx-dl/volcano-scheduler chown hwMindX:hwMindX /var/log/mindx-dl/volcano-scheduler
Container Manager
mkdir -m 750 /var/log/mindx-dl/container-manager chown root:root /var/log/mindx-dl/container-manager
Node where the container recovery feature is required
-
Parent topic: Preparing for Installation