aclprofCreateConfig

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

Description

Creates data of the aclprofConfig type as a Profiling configuration.

Created aclProfConfig data can be reused in multiple calls. The user needs to ensure the consistency and accuracy of the data.

To destroy data of the aclprofConfig type, call aclprofDestroyConfig.

Restrictions

  • Use the aclprofDestroyConfig API to destroy data of the aclprofConfig type. If data is not destroyed, the memory cannot be freed.
  • Use this API together with the aclprofDestroyConfig API. Call aclprofCreateConfig first and then aclprofDestroyConfig.

Prototype

aclprofConfig *aclprofCreateConfig(uint32_t *deviceIdList, uint32_t deviceNums, aclprofAicoreMetrics aicoreMetrics, const aclprofAicoreEvents *aicoreEvents, uint64_t dataTypeConfig)

Parameters

Parameter

Input/Output

Description

deviceIdList

Input

Device ID list. Set this parameter based on the actual device ID.

deviceNums

Input

Device count. Ensure that the number of devices in deviceIdList is the same as that of deviceNums. Otherwise, service exceptions may occur subsequently.

aicoreMetrics

Input

AI Core metric to profile.

aicoreEvents

Input

AI Core event, which is set to NULL.

dataTypeConfig

Input

Logically ORed by the following macros (for example, ACL_PROF_ACL_API | ACL_PROF_AICORE_METRICS). Each macro corresponds to a specific metric.

  • ACL_PROF_ACL_API: collects profile data of APIs, including the synchronous/asynchronous memory copy latencies between the host and devices and between devices.
  • ACL_PROF_TASK_TIME: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data.
  • ACL_PROF_TASK_TIME_L0: collects operator delivery and execution duration data. Compared with ACL_PROF_TASK_TIME , ACL_PROF_TASK_TIME_L2 does not collect basic operator information, so the performance overhead during collection is small, and this enables more accurate collection of statistics on time duration data.
  • ACL_PROF_GE_API_L0: collects the time consumption data of the dynamic-shape operator in the main host scheduling phase to accurately collect statistics on the time consumption data.
  • ACL_PROF_GE_API_L1: collects finer-grained time consumption data of dynamic-shape operators in the host scheduling phase to provide more comprehensive performance analysis data.
  • ACL_PROF_OP_ATTR: collects operator attribute information. Currently, only the aclnn operator is supported.
  • ACL_PROF_AICORE_METRICS: collects AI Core metrics. Required for aicoreMetrics to take effect.
  • ACL_PROF_TASK_MEMORY: controls the switch for collecting the memory usage of CANN operators, which is used to optimize the memory usage. In the single-operator scenario, the operator memory size and lifecycle information is collected based on GE component and operator dimensions (the GE component memory is not collected in the single-operator API execution mode). In the static graph and static subgraph scenarios, the operator memory size and lifecycle information is collected based on operator dimension during the operator compilation phase.
  • ACL_PROF_AICPU: collects traces of AI CPU tasks, including the start and end of each task.
  • ACL_PROF_L2CACHE: collects the L2 cache data and TLB page table cache hit ratio.
  • ACL_PROF_HCCL_TRACE: collects communication data.
  • ACL_PROF_MSPROFTX: collects the profile data output by the user and high-level framework program. You can call the msproftx extension interface or mstx interface in the collection process (between the aclprofStart and aclprofStop interfaces) to record the time span of a specific event during application execution, write the time span into the profile data file, and use the msprof tool to parse the file, and export the performance analysis data.
  • ACL_PROF_TRAINING_TRACE: collects iteration traces.
  • ACL_PROF_RUNTIME_API: collects runtime API profile data.

Returns

  • Success: a pointer to data of the aclprofConfig type
  • Failure: nullptr