Pre-check

Before performing in-depth analysis, you need to complete the following basic checks:

  • Extra process check: Check whether the operating environment has any background process or plug-in that affects the CPU performance (this check is usually performed by the service scenario owner and is rarely the main cause).
  • Task load balancing check: Analyze the computing duration of each card using the profiling tool. If the time consumed by each card is similar and no obvious difference exists between the cards, the task load is balanced (the service scenario owner can further confirm the result). See Figure 1.
    Figure 1 Balanced computing task loads among multiple cards
  • CPU affinity configuration for isolation (in the A+K scenario): In the A+K scenario where the server scheduling capability is limited (CPU core switching or preemption may occur), you are advised to try CPU affinity configuration for task isolation.

    Method: Run the taskset command or set the environment variable as export CPU_AFFINITY_CONF=1 or export CPU_AFFINITY_CONF=2.

    For details about the CPU_AFFINITY_CONF environment variable, see "Performance Tuning" > "Tuning Methods" > "Scheduling Optimization" > Core Binding Tuning" in PyTorch Training Model Porting and Tuning Guide.