PipeUtilization (Percentages of Time Taken by Compute Units and MTEs)
The time consumption and percentage data of compute units and MTEs is collected in PipeUtilization.csv. You are advised to optimize the data transfer logic to improve bandwidth utilization. For details, see the field description in the following table.
- The unit GB/s indicates that 1 GB of data is transmitted per second.
- In the field description table, the total cycles of each ratio indicate the number of cycles on the cube core or vector core. ai* can be aic and aiv. aic indicates the cube core, and aiv indicates the vector core.
Atlas A3 Training Products/Atlas A3 Inference Products and Atlas A2 Training Products/Atlas A2 Inference Products
|
Field |
Description |
|---|---|
|
block_id |
Number of running task blocks, which corresponds to the number of cores configured during task running. |
|
sub_block_id |
Name and sequence number of each block used for task running. |
|
aic_time(us) |
Execution time of each AI Cube Core compute unit after the task is allocated to the unit, in μs. |
|
aic_total_cycles |
Total number of cycles executed on each AI Cube Core compute unit after the task is allocated to the unit. |
|
aiv_time(us) |
Execution time of each AI Vector Core compute unit after the task is allocated to the unit, in μs. |
|
aiv_total_cycles |
Total number of cycles executed on each AI Vector Core compute unit after the task is allocated to the unit. |
|
aiv_vec_time(us) |
Time taken to execute Vector instructions |
|
aiv_vec_ratio |
Ratio of cycles taken to execute Vector instructions to the total cycles. |
|
aic_cube_time(us) |
Time taken to execute Cube instructions (fp16 and s16). |
|
aic_cube_ratio |
Ratio of cycles taken to execute Cube instructions (fp16 and s16) to the total cycles. |
|
ai*_scalar_time(us) |
Time taken to execute Scalar instructions |
|
ai*_scalar_ratio |
Ratio of cycles taken to execute Scalar instructions to the total cycles. |
|
aic_fixpipe_time(us) |
Time taken to execute fixpipe instructions (L0C-to-GM/L1 movement) |
|
aic_fixpipe_ratio |
Ratio of cycles taken to execute fixpipe instructions (L0C-to-GM/L1 movement) to the total cycles. |
|
aic_mte1_time(us) |
Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time. |
|
aic_mte1_ratio |
Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement) to the total cycles. |
|
ai*_mte2_time(us) |
Time taken to execute MTE2 instructions (GM-to-AI Core movement). |
|
ai*_mte2_ratio |
Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles. |
|
ai*_mte3_time(us) |
Time taken to execute MTE3 instructions (AI Core-to-GM movement). |
|
ai*_mte3_ratio |
Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM movement) to the total cycles. |
|
ai*_icache_miss_rate |
iCache miss rate, that is, L1 cache that does not hit instructions. The smaller the value, the better. |
|
aic_mte3_active_bw(GB/s) |
Active bandwidth of MTE3 instructions (AI Core-to-DDR Cube movement) corresponding to the active cycles. |
|
aiv_mte3_active_bw(GB/s) |
Active bandwidth of MTE3 instructions (AI Core-to-DDR AIV movement) corresponding to the active cycles. |
|
aic_fixpipe_active_bw(GB/s) |
Active bandwidth of fixpipe instructions (L0C-to-OUT/L1 movement) corresponding to the active cycles. |
|
aiv_mte2_active_bw(GB/s) |
Active bandwidth of MTE2 instructions (DDR-to-AI Core AIV movement) corresponding to the active cycles. |
|
aic_mte1_active_bw(GB/s) |
Active bandwidth of MTE1 instructions in the Cube unit corresponding to the active cycles, specifically involving L1-to-L0A and L1-to-L0B channels.
NOTE:
This field is displayed only when dynamic instrumentation is enabled (--aic-metrics=MemoryDetail) for |
|
aic_mte2_active_bw(GB/s) |
Active bandwidth of MTE2 instructions in the Cube unit corresponding to the active cycles, specifically involving GM-to-L1, GM-to-L0A, and GM-to-L0B channels.
NOTE:
This field is displayed only when dynamic instrumentation is enabled (--aic-metrics=MemoryDetail) for |
|
ai*_scalar_single_time(us) |
Instruction time for single-issue Scalar instructions (one instruction issued per cycle). |
|
ai*_scalar_dual_time(us) |
Instruction time for dual-issue Scalar instructions (two instructions issued per cycle). |
|
ai*_scalar_wait_time(us) |
Blockage time caused by intra-core wait instructions within Scalar operations. |
|
ai*_scalar_wait_id*_time(us) |
Blockage time caused by inter-core wait instructions for IDs within Scalar operations.
NOTE:
id* is a placeholder, which can correspond to any core ID from ID0 to ID15. Inter-core synchronization metrics (ai*_scalar_wait_id0_time to ai*_scalar_wait_id15_time) are displayed only when relevant data is available. |
|
aic_scalar_mte1_stall_time(us) |
Scalar instruction blockage time caused by a full MTE1 IQ queue. |
|
ai*_scalar_mte2_stall_time(us) |
Scalar instruction blockage time caused by a full MTE2 IQ queue. |
|
ai*_scalar_mte3_stall_time(us) |
Scalar instruction blockage time caused by a full MTE3 IQ queue. |
|
aic_scalar_cube_stall_time(us) |
Scalar instruction blockage time caused by a full Cube IQ queue. |
|
aic_scalar_vector_stall_time(us) |
Scalar instruction blockage time caused by a full Vector IQ queue. |
|
ai*_scalar_wait_ib_time(us) |
Time spent by Scalar instructions waiting for iCache via the IB. |
|
aic_scalar_stall_by_ub_time(us) |
Scalar instruction blockage time caused by the UB. |
Atlas Inference Products
|
Field |
Description |
|---|---|
|
aic_time(us) |
Execution time of each AI Core compute unit after the task is allocated to the unit, in μs. |
|
aic_total_cycles |
Total number of cycles executed on each AI Core compute unit after the task is allocated to the unit. |
|
aic_cube_time(us) |
Time taken to execute Cube instructions (fp16 and s16). |
|
aic_cube_ratio |
Ratio of cycles taken to execute Cube instructions (fp16 and s16) to the total cycles. |
|
aic_scalar_time(us) |
Time taken to execute Scalar instructions |
|
aic_scalar_ratio |
Ratio of cycles taken to execute Scalar instructions to the total cycles. |
|
aic_mte1_time(us) |
Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time. |
|
aic_mte1_ratio |
Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement) to the total cycles. |
|
aic_mte2_time(us) |
Time taken to execute MTE2 instructions (GM-to-AI Core movement). |
|
aic_mte2_ratio |
Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles. |
|
aic_mte3_time(us) |
Time taken to execute MTE3 instructions (AI Core-to-GM movement). |
|
aic_mte3_ratio |
Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM movement) to the total cycles. |
|
aic_icache_miss_rate |
iCache miss rate, that is, L1 cache that does not hit instructions. The smaller the value, the better. |
|
aic_vec_time(us) |
Time taken to execute Vector instructions |
|
aic_vec_ratio |
Ratio of cycles taken to execute Vector instructions to the total cycles. |