PipeUtilization (Percentages of Time Taken by Compute Units and MTEs)

The time consumption and percentage data of compute units and MTEs is collected in PipeUtilization.csv. You are advised to optimize the data transfer logic to improve bandwidth utilization. For details, see the field description in the following table.

The unit GB/s indicates that 1 GB of data is transmitted per second.

In the field description table, the total cycles of each ratio indicate the number of cycles on the cube core or vector core. ai* can be aic and aiv. aic indicates the cube core, and aiv indicates the vector core.

Atlas A3 Training Products/Atlas A3 Inference Products and Atlas A2 Training Products/Atlas A2 Inference Products

Figure 1 PipeUtilization.csv file

See the following table for more details.

**Table 1** Field description
Field	Description
block_id	Number of running task blocks, which corresponds to the number of cores configured during task running.
sub_block_id	Name and sequence number of each block used for task running.
aic_time(us)	Execution time of each AI Cube Core compute unit after the task is allocated to the unit, in μs.
aic_total_cycles	Total number of cycles executed on each AI Cube Core compute unit after the task is allocated to the unit.
aiv_time(us)	Execution time of each AI Vector Core compute unit after the task is allocated to the unit, in μs.
aiv_total_cycles	Total number of cycles executed on each AI Vector Core compute unit after the task is allocated to the unit.
aiv_vec_time(us)	Time taken to execute Vector instructions
aiv_vec_ratio	Ratio of cycles taken to execute Vector instructions to the total cycles.
aic_cube_time(us)	Time taken to execute Cube instructions (fp16 and s16).
aic_cube_ratio	Ratio of cycles taken to execute Cube instructions (fp16 and s16) to the total cycles.
ai*_scalar_time(us)	Time taken to execute Scalar instructions
ai*_scalar_ratio	Ratio of cycles taken to execute Scalar instructions to the total cycles.
aic_fixpipe_time(us)	Time taken to execute fixpipe instructions (L0C-to-GM/L1 movement)
aic_fixpipe_ratio	Ratio of cycles taken to execute fixpipe instructions (L0C-to-GM/L1 movement) to the total cycles.
aic_mte1_time(us)	Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time.
aic_mte1_ratio	Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement) to the total cycles.
ai*_mte2_time(us)	Time taken to execute MTE2 instructions (GM-to-AI Core movement).
ai*_mte2_ratio	Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles.
ai*_mte3_time(us)	Time taken to execute MTE3 instructions (AI Core-to-GM movement).
ai*_mte3_ratio	Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM movement) to the total cycles.
ai*_icache_miss_rate	iCache miss rate, that is, L1 cache that does not hit instructions. The smaller the value, the better.
aic_mte3_active_bw(GB/s)	Active bandwidth of MTE3 instructions (AI Core-to-DDR Cube movement) corresponding to the active cycles.
aiv_mte3_active_bw(GB/s)	Active bandwidth of MTE3 instructions (AI Core-to-DDR AIV movement) corresponding to the active cycles.
aic_fixpipe_active_bw(GB/s)	Active bandwidth of fixpipe instructions (L0C-to-OUT/L1 movement) corresponding to the active cycles.
aiv_mte2_active_bw(GB/s)	Active bandwidth of MTE2 instructions (DDR-to-AI Core AIV movement) corresponding to the active cycles.
aic_mte1_active_bw(GB/s)	Active bandwidth of MTE1 instructions in the Cube unit corresponding to the active cycles, specifically involving L1-to-L0A and L1-to-L0B channels. NOTE: This field is displayed only when dynamic instrumentation is enabled (--aic-metrics=MemoryDetail) for Atlas A3 training products / Atlas A3 inference products and Atlas A2 training products / Atlas A2 inference products .
aic_mte2_active_bw(GB/s)	Active bandwidth of MTE2 instructions in the Cube unit corresponding to the active cycles, specifically involving GM-to-L1, GM-to-L0A, and GM-to-L0B channels. NOTE: This field is displayed only when dynamic instrumentation is enabled (--aic-metrics=MemoryDetail) for Atlas A3 training products / Atlas A3 inference products and Atlas A2 training products / Atlas A2 inference products .
ai*_scalar_single_time(us)	Instruction time for single-issue Scalar instructions (one instruction issued per cycle).
ai*_scalar_dual_time(us)	Instruction time for dual-issue Scalar instructions (two instructions issued per cycle).
ai*_scalar_wait_time(us)	Blockage time caused by intra-core wait instructions within Scalar operations.
ai_scalar_wait_id_time(us)	Blockage time caused by inter-core wait instructions for IDs within Scalar operations. NOTE: id* is a placeholder, which can correspond to any core ID from ID0 to ID15. Inter-core synchronization metrics (ai_scalar_wait_id0_time* to ai_scalar_wait_id15_time*) are displayed only when relevant data is available.
aic_scalar_mte1_stall_time(us)	Scalar instruction blockage time caused by a full MTE1 IQ queue.
ai*_scalar_mte2_stall_time(us)	Scalar instruction blockage time caused by a full MTE2 IQ queue.
ai*_scalar_mte3_stall_time(us)	Scalar instruction blockage time caused by a full MTE3 IQ queue.
aic_scalar_cube_stall_time(us)	Scalar instruction blockage time caused by a full Cube IQ queue.
aic_scalar_vector_stall_time(us)	Scalar instruction blockage time caused by a full Vector IQ queue.
ai*_scalar_wait_ib_time(us)	Time spent by Scalar instructions waiting for iCache via the IB.
aic_scalar_stall_by_ub_time(us)	Scalar instruction blockage time caused by the UB.

Atlas Inference Products

Figure 2 PipeUtilization.csv file

See the following table for more details.

**Table 2** Field description
Field	Description
aic_time(us)	Execution time of each AI Core compute unit after the task is allocated to the unit, in μs.
aic_total_cycles	Total number of cycles executed on each AI Core compute unit after the task is allocated to the unit.
aic_cube_time(us)	Time taken to execute Cube instructions (fp16 and s16).
aic_cube_ratio	Ratio of cycles taken to execute Cube instructions (fp16 and s16) to the total cycles.
aic_scalar_time(us)	Time taken to execute Scalar instructions
aic_scalar_ratio	Ratio of cycles taken to execute Scalar instructions to the total cycles.
aic_mte1_time(us)	Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time.
aic_mte1_ratio	Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement) to the total cycles.
aic_mte2_time(us)	Time taken to execute MTE2 instructions (GM-to-AI Core movement).
aic_mte2_ratio	Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles.
aic_mte3_time(us)	Time taken to execute MTE3 instructions (AI Core-to-GM movement).
aic_mte3_ratio	Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM movement) to the total cycles.
aic_icache_miss_rate	iCache miss rate, that is, L1 cache that does not hit instructions. The smaller the value, the better.
aic_vec_time(us)	Time taken to execute Vector instructions
aic_vec_ratio	Ratio of cycles taken to execute Vector instructions to the total cycles.

Parent topic: msprof op