PipeUtilization (Percentages of Time Taken by Compute Units and MTEs)

The time consumption and percentage data of compute units and MTEs is collected in PipeUtilization.csv. You are advised to optimize the data transfer logic to improve bandwidth utilization. For details, see the field description in the following table.

Figure 1 PipeUtilization.csv file
See the following table for more details.
Table 1 Field description

Field

Description

block_id

Number of running task blocks, which corresponds to the number of cores configured during task running.

sub_block_id

Name and sequence number of each block used for task running.

aic_time(us)

Execution time of each AI Core compute unit after the task is allocated to the unit, in μs.

aic_total_cycles

Total number of cycles executed on each AI Core compute unit after the task is allocated to the unit.

aiv_time(us)

Execution time of each AI Vector Core compute unit after the task is allocated to the unit, in μs.

aiv_total_cycles

Total number of cycles executed on each AI Vector Core compute unit after the task is allocated to the unit.

aiv_vec_time(us)

Time taken to execute Vector instructions

aiv_vec_ratio

Ratio of cycles taken to execute Vector instructions to the total cycles.

aic_cube_time(us)

Time taken to execute Cube instructions (fp16 and s16).

aic_cube_ratio

Ratio of cycles taken to execute Cube instructions (fp16 and s16) to the total cycles.

ai*_scalar_time(us)

Time taken to execute Scalar instructions

ai*_scalar_ratio

Ratio of cycles taken to execute Scalar instructions to the total cycles.

aic_fixpipe_time(us)

Time taken to execute fixpipe instructions (L0C-to-GM/L1 movement)

aic_fixpipe_ratio

Ratio of cycles taken to execute fixpipe instructions (L0C-to-GM/L1 movement) to the total cycles.

aic_mte1_time(us)

Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time.

aic_mte1_ratio

Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B transfer) to the total cycles.

ai*_mte2_time(us)

Time taken to execute MTE2 instructions (GM-to-AI Core movement)

ai*_mte2_ratio

Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles.

ai*_mte3_time(us)

Time taken to execute MTE3 instructions (AI Core-to-GM movement)

ai*_mte3_ratio

Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM transfer) to the total cycles.

ai*_icache_miss_rate

iCache miss rate, that is, L1 cache that does not hit instructions. The smaller the value, the better.

Figure 2 PipeUtilization.csv file
See the following table for more details.
Table 2 Field description

Field

Description

aic_time(us)

Execution time of each AI Core compute unit after the task is allocated to the unit, in μs.

aic_total_cycles

Total number of cycles executed on each AI Core compute unit after the task is allocated to the unit.

aic_cube_time(us)

Time taken to execute Cube instructions (fp16 and s16).

aic_cube_ratio

Ratio of cycles taken to execute Cube instructions (fp16 and s16) to the total cycles.

aic_scalar_time(us)

Time taken to execute Scalar instructions

aic_scalar_ratio

Ratio of cycles taken to execute Scalar instructions to the total cycles.

aic_mte1_time(us)

Time taken to execute MTE1 instructions (L1-to-L0A/L0B movement), excluding the movement wait time.

aic_mte1_ratio

Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B transfer) to the total cycles.

aic_mte2_time(us)

Time taken to execute MTE2 instructions (GM-to-AI Core movement)

aic_mte2_ratio

Ratio of cycles taken to execute MTE2 instructions (GM-to-AI Core movement) to the total cycles.

aic_mte3_time(us)

Time taken to execute MTE3 instructions (AI Core-to-GM movement)

aic_mte3_ratio

Ratio of cycles taken to execute MTE3 instructions (AI Core-to-GM transfer) to the total cycles.

aic_icache_miss_rate

iCache miss rate, that is, L1 cache that does not hit instructions. The smaller the value, the better.

aic_vec_time(us)

Time taken to execute Vector instructions

aic_vec_ratio

Ratio of cycles taken to execute Vector instructions to the total cycles.