aclprofCreateConfig

Description

Creates data of the aclprofConfig type as a Profiling configuration.

Created aclProfConfig data can be reused in multiple calls. The user needs to ensure the consistency and accuracy of the data.

To destroy data of the aclprofConfig type, call aclprofDestroyConfig.

Restrictions

  • Use the aclprofDestroyConfig API to destroy data of the aclprofConfig type. If data is not destroyed, the memory cannot be freed.
  • Use this API together with the aclprofDestroyConfig API. Call aclprofCreateConfig first and then aclprofDestroyConfig.

Prototype

aclprofConfig *aclprofCreateConfig(uint32_t *deviceIdList, uint32_t deviceNums, aclprofAicoreMetrics aicoreMetrics, const aclprofAicoreEvents *aicoreEvents, uint64_t dataTypeConfig)

Parameters

Parameter

Input/Output

Description

deviceIdList

Input

Device ID list. Set this parameter based on the actual device ID.

deviceNums

Input

Device count. Ensure that the number of devices in deviceIdList is the same as that of deviceNums. Otherwise, service exceptions may occur subsequently.

aicoreMetrics

Input

AI Core metric to profile.

aicoreEvents

Input

AI Core event, which is set to NULL.

dataTypeConfig

Input

Logically ORed by the following macros (for example, ACL_PROF_ACL_API | ACL_PROF_AICORE_METRICS). Each macro corresponds to a specific metric.

  • ACL_PROF_ACL_API: collects profile data of AscendCL APIs, including the synchronous/asynchronous memory copy latency between the host and devices as well as between devices.
  • ACL_PROF_TASK_TIME: collects operator delivery and execution duration data, as well as basic operator information, to provide more comprehensive performance analysis data.
  • ACL_PROF_TASK_TIME_L0: collects operator delivery and execution duration data. Compared with ACL_PROF_TASK_TIME, ACL_PROF_TASK_TIME_L0 does not collect basic operator information, so the performance overhead during collection is smaller, and this enables more accurate collection of statistics on time duration data.
  • ACL_PROF_OP_ATTR: collects operator attribute information. Currently, only the aclnn operator is supported.
  • ACL_PROF_AICORE_METRICS: collects AI Core metrics. Required for aicoreMetrics to take effect.
  • ACL_PROF_TASK_MEMORY: controls the switch for collecting the memory usage of CANN operators, which is used to optimize the memory usage. In the single-operator scenario, the operator memory size and lifecycle information is collected based on GE component and operator dimensions (the GE component memory is not collected in the single-operator API execution mode). In the static graph and static subgraph scenarios, the operator memory size and lifecycle information is collected based on operator dimension during the operator compilation phase.
  • ACL_PROF_AICPU: collects traces of AI CPU tasks, including the start and end of each task.
  • ACL_PROF_L2CACHE: collects L2 Cache data. (It is not supported by the Atlas 200/300/500 Inference Product .)
  • ACL_PROF_HCCL_TRACE: collects HCCL data.
  • ACL_PROF_MSPROFTX: collects the profile data output by the user and high-level framework program. You can call either of the following APIs in the collection process (between the aclprofStart and aclprofStop calls) to record the time span of a specific event during app execution, write the profile data file, use the msprof tool to parse the file, and export and display the profile data:
  • ACL_PROF_TRAINING_TRACE: collects iteration traces.
  • ACL_PROF_RUNTIME_API: collects runtime API profile data.

Returns

  • Success: a pointer to data of the aclprofConfig type
  • Failure: nullptr