ai_core_utilization (AI Core Instruction Proportion)
The timeline information of the AI Core instruction proportion data is displayed at the AI Core Utilization level in the msprof*.json file, and the summary information is summarized in the ai_core_utilization_*.csv file.
Availability
AI Core Instruction Proportion Data in msprof*.json
The file content is formatted as follows:
|
Field |
Description |
|---|---|
|
Average |
Mean value. |
|
Core <id> |
Core ID. |
|
utilization(%) |
Percentage of total execution cycles (counting from the first operator instruction executed by the AI Core to the completion of the last instruction executed) of a task on the AI Core in the current sampling period. |
ai_core_utilization_*.csv File
The file content is formatted as follows.
The file display result varies according to the value of --aic-metrics. The complete fields are as follows.
|
Field |
Description |
|---|---|
|
vec_ratio |
Ratio of cycles taken to execute Vector instructions to the total cycles. |
|
mac_ratio |
Ratio of cycles taken to execute Cube instructions to the total cycles. |
|
scalar_ratio |
Ratio of cycles taken to execute Scalar instructions to the total cycles. |
|
mte1_ratio |
Ratio of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B transfer) to the total cycles. |
|
mte2_ratio |
Ratio of cycles taken to execute MTE2 instructions (DDR-to-AI Core transfert) to the total cycles. |
|
mte3_ratio |
Ratio of cycles taken to execute MTE3 instructions (AI Core-to-DDR transfer) to the total cycles. |
|
icache_miss_rate |
iCache is the L2 cache reserved for instructions. If the value of icache_miss_rate is high, the AI Core reads instructions at a low efficiency. |
|
memory_bound |
AI Core memory bound, calculated as: mte2_ratio/max(mac_ratio, vec_ratio). If the value is less than 1, no memory bound exists. If the value is greater than 1, the AI Core spends most of the time on memory transfers rather than computation. A larger value indicates a more severe memory bottleneck. |
|
Field |
Description |
|---|---|
|
mac_fp16_ratio |
Ratio of cycles taken to execute Cube fp16 instructions to the total cycles. |
|
mac_int8_ratio |
Ratio of cycles taken to execute Cube int8 instructions to the total cycles. |
|
vec_fp32_ratio |
Ratio of cycles taken to execute Vector fp32 instructions to the total cycles. |
|
vec_fp16_ratio |
Ratio of cycles taken to execute Vector fp16 instructions to the total cycles. |
|
vec_int32_ratio |
Ratio of cycles taken to execute Vector int32 instructions to the total cycles. |
|
vec_misc_ratio |
Ratio of cycles taken to execute Vector misc instructions to the total cycles. |
|
cube_fops |
Floating-point operations (FLOPs, that is, fops in this field) of the Cube type, indicating the computation amount. This field can be used to measure the complexity of an algorithm or model. |
|
vector_fops |
Floating-point operations (FLOPs, that is, fops in this field) of the Vector type, indicating the computation amount. This field can be used to measure the complexity of an algorithm or model. |
|
Field |
Description |
|---|---|
|
ub_read_bw(GB/s) |
UB read bandwidth (GB/s) |
|
ub_write_bw(GB/s) |
UB write bandwidth (GB/s) |
|
l1_read_bw(GB/s) |
L1 read bandwidth (GB/s) |
|
l1_write_bw(GB/s) |
L1 write bandwidth (GB/s) |
|
l2_read_bw |
L2 read bandwidth (GB/s) It is supported only by the |
|
l2_write_bw |
L2 write bandwidth (GB/s) It is supported only by the |
|
main_mem_read_bw(GB/s) |
Main memory read bandwidth (GB/s) |
|
main_mem_write_bw(GB/s) |
Main memory write bandwidth (GB/s) |
|
Field |
Description |
|---|---|
|
l0a_read_bw(GB/s) |
l0a read bandwidth (GB/s) |
|
l0a_write_bw(GB/s) |
l0a write bandwidth (GB/s) |
|
l0b_read_bw(GB/s) |
l0b read bandwidth (GB/s) |
|
l0b_write_bw(GB/s) |
l0b write bandwidth (GB/s) |
|
l0c_read_bw(GB/s) |
Bandwidth rate for Vector to read data from L0C, in GB/s. |
|
l0c_write_bw(GB/s) |
Bandwidth rate for Vector to write data to L0C, in GB/s. |
|
l0c_read_bw_cube(GB/s) |
Bandwidth rate for Cube to read data from L0C, in GB/s. |
|
l0c_write_bw_cube(GB/s) |
Bandwidth rate for Cube to write data to L0C, in GB/s. |
|
Field |
Description |
|---|---|
|
ub_read_bw_mte(GB/s) |
Bandwidth rate for MTE to read data from UB, in GB/s. It is supported only by the |
|
ub_write_bw_mte(GB/s) |
Bandwidth rate for MTE to write data to UB, in GB/s. It is supported only by the |
|
ub_read_bw_vector(GB/s) |
Bandwidth rate for Vector to read data from UB, in GB/s. |
|
ub_write_bw_vector(GB/s) |
Bandwidth rate for Vector to write data to UB, in GB/s. |
|
ub_read_bw_scalar(GB/s) |
Bandwidth rate for Scalar to read data from UB, in GB/s. |
|
ub_write_bw_scalar(GB/s) |
Bandwidth rate for Scalar to write data to UB, in GB/s. |
|
Field |
Description |
|---|---|
|
vec_bankgroup_cflt_ratio |
Ratio of cycles taken to execute vec_bankgroup_stall_cycles instructions to the total cycles. The block stride of Vector instructions is improperly set, resulting in bankgroup conflicts. |
|
vec_bank_cflt_ratio |
Ratio of cycles taken to execute vec_bank_stall_cycles instructions to the total cycles. The read/write pointer address of the Vector instruction operand is improper, resulting in bank conflicts. |
|
vec_resc_cflt_ratio |
Ratio of cycles taken to execute vec_resc_cflt_ratio instructions to the total cycles. If an operator involves multiple compute units, ensure that they are concurrently scheduled. When a compute unit is working, but the operator logic still delivers instructions to it, the overall computing power is not fully utilized. |