AI Core Metrics View
Click AI Core Metrics in the data pane at the bottom to view AI Core metrics.
|
Field |
Description |
|---|---|
|
Task-based: Pipeline Utilization |
|
|
Task ID |
Task ID |
|
Stream ID |
Stream ID |
|
Op Name |
Operator name |
|
OP Type |
Operator type |
|
Task Start Time |
Task start time |
|
Task Duration(us) |
Task running duration (μs) |
|
Task Wait Time(us) |
Task waiting time (μs) |
|
Aicore Time(us) |
AI Core running duration (μs) |
|
Total Cycles |
Number of cycles taken to execute all task instructions |
|
Vec Time(us) |
Time (μs) taken to execute Vector instructions |
|
Vec Ratio |
Percentage of cycles taken to execute Vector instructions |
|
Mac Time(us) |
Time (μs) taken to execute Cube instructions |
|
Mac Ratio |
Percentage of cycles taken to execute Cube instructions |
|
Scalar Time(us) |
Time (μs) taken to execute Scalar instructions |
|
Scalar Ratio |
Percentage of cycles taken to execute Scalar instructions. |
|
Mte1 Time(us) |
Time (μs) taken to execute MTE1 instructions (L1-to-L0A/L0B movement) instructions |
|
Mte1 Ratio |
Percentage of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement). |
|
Mte2 Time(us) |
Time (μs) taken to execute MTE2 instructions (DDR-to-AI Core movement) |
|
Mte2 Ratio |
Percentage of cycles taken to execute MTE2 instructions (DDR-to-AI Core movement). |
|
Mte3 Time(us) |
Time (μs) taken to execute MTE3 instructions (AI Core-to-DDR movement) |
|
Mte3 Ratio |
Percentage of cycles taken to execute MTE3 instructions (AI Core-to-DDR movement). |
|
Icache Miss Rate |
I-Cache miss rate. The smaller the value, the higher the performance. |
|
Memory Bound |
Metric used to identify whether a memory bottleneck exists during the operator computation performed by the AI Core. The value is calculated as follows: Mte2 Ratio/max(Mac Ratio, Vec Ratio). If the value is less than 1, no memory bound exists. If the value is greater than 1, a memory bound exists. The greater the value is, the severer the bound is. |
|
Task-based: Arithmetic Utilization |
|
|
Task ID |
Task ID |
|
Stream ID |
Stream ID |
|
Op Name |
Operator name |
|
OP Type |
Operator type |
|
Task Start Time |
Task start time |
|
Task Duration(us) |
Task running duration (μs) |
|
Task Wait Time(us) |
Task waiting time (μs) |
|
Aicore Time(us) |
AI Core running duration (μs) |
|
Total Cycles |
Number of cycles taken to execute all task instructions |
|
Mac Fp16 Ratio |
Percentage of cycles taken to execute Cube fp16 instructions |
|
Mac Int8 Ratio |
Percentage of cycles taken to execute Cube int8 instructions |
|
Vec Fp32 Ratio |
Percentage of cycles taken to execute Vector fp32 instructions |
|
Vec Fp16 Ratio |
Percentage of cycles taken to execute Vector fp16 instructions |
|
Vec Int32 Ratio |
Percentage of cycles taken to execute Vector int32 instructions |
|
Vec Misc Ratio |
Percentage of cycles taken to execute Vector misc instructions |
|
Cube Fops |
Floating-point operations (FLOPs, that is, fops in this command) of the Cube type, indicating the computation amount. This field can be used to measure the complexity of an algorithm or model. |
|
Vector Fops |
Floating-point operations (FLOPs, that is, fops in this command) of the Vector type, indicating the computation amount. This field can be used to measure the complexity of an algorithm or model. |
|
Task-based: UB/L1/L2/Main Memory Bandwidth |
|
|
Task ID |
Task ID |
|
Stream ID |
Stream ID |
|
Op Name |
Operator name |
|
OP Type |
Operator type |
|
Task Start Time |
Task start time |
|
Task Duration(us) |
Task running duration (μs) |
|
Task Wait Time(us) |
Task waiting time (μs) |
|
Aicore Time(us) |
AI Core running duration (μs) |
|
Total Cycles |
Number of cycles taken to execute all task instructions |
|
ub_read_bw(GB/s) |
UB read bandwidth (GB/s) |
|
ub_write_bw(GB/s) |
UB write bandwidth (GB/s) |
|
l1_read_bw(GB/s) |
L1 read bandwidth (GB/s) |
|
l1_write_bw(GB/s) |
L1 write bandwidth (GB/s) |
|
l2_read_bw(GB/s) |
L2 read bandwidth (GB/s) |
|
l2_write_bw(GB/s) |
L2 write bandwidth (GB/s) |
|
main_mem_read_bw(GB/s) |
Main memory read bandwidth (GB/s) |
|
main_mem_write_bw(GB/s) |
Main memory write bandwidth (GB/s) |
|
Task-based: L0A/L0B/L0C Memory Bandwidth |
|
|
Task ID |
Task ID |
|
Stream ID |
Stream ID |
|
Op Name |
Operator name |
|
OP Type |
Operator type |
|
Task Start Time |
Task start time |
|
Task Duration(us) |
Task running duration (μs) |
|
Task Wait Time(us) |
Task waiting time (μs) |
|
Aicore Time(us) |
AI Core running duration (μs) |
|
Total Cycles |
Number of cycles taken to execute all task instructions |
|
scalar_ld_ratio |
Percentage of cycles taken to execute Scalar-read-UB instructions |
|
scalar_st_ratio |
Percentage of cycles taken to execute Scalar-write-UB instructions |
|
l0a_read_bw(GB/s) |
L0a read bandwidth (GB/s) |
|
l0a_write_bw(GB/s) |
L0a write bandwidth (GB/s) |
|
l0b_read_bw(GB/s) |
L0b read bandwidth (GB/s) |
|
l0b_write_bw(GB/s) |
L0b write bandwidth (GB/s) |
|
l0c_read_bw(GB/s) |
Bandwidth rate for Vector to read data from L0C, in GB/s. |
|
l0c_write_bw(GB/s) |
Bandwidth rate for Vector to write data to L0C, in GB/s. |
|
l0c_read_bw_cube(GB/s) |
Bandwidth rate for Cube to read data from L0C, in GB/s. |
|
l0c_write_bw_cube(GB/s) |
Bandwidth rate for Cube to write data to L0C, in GB/s. |
|
Task-based: UB Memory Bandwidth |
|
|
Task ID |
Task ID |
|
Stream ID |
Stream ID |
|
Op Name |
Operator name |
|
OP Type |
Operator type |
|
Task Start Time |
Task start time |
|
Task Duration(us) |
Task running duration (μs) |
|
Task Wait Time(us) |
Task waiting time (μs) |
|
Aicore Time(us) |
AI Core running duration (μs) |
|
Total Cycles |
Number of cycles taken to execute all task instructions |
|
ub_read_bw_mte(GB/s) |
Bandwidth rate for MTE to read data from UB, in GB/s. The Ascend 310 AI Processor supports this function. |
|
ub_write_bw_mte(GB/s) |
Bandwidth rate for MTE to write data to UB, in GB/s. The Ascend 310 AI Processor supports this function. |
|
ub_read_bw_vector(GB/s) |
Bandwidth rate for Vector to read data from UB, in GB/s. |
|
ub_write_bw_vector(GB/s) |
Bandwidth rate for Vector to write data to UB, in GB/s. |
|
ub_read_bw_scalar(GB/s) |
Bandwidth rate for Scalar to read data from UB, in GB/s. |
|
ub_write_bw_scalar(GB/s) |
Bandwidth rate for Scalar to write data to UB, in GB/s. |
|
Sample-based: Pipeline Utilization |
|
|
Core ID |
AI Core ID. |
|
Vec Ratio |
Percentage of cycles taken to execute Vector instructions |
|
Mac Ratio |
Percentage of cycles taken to execute Cube instructions |
|
Scalar Ratio |
Percentage of cycles taken to execute Scalar instructions. |
|
Mte1 Ratio |
Percentage of cycles taken to execute MTE1 instructions (L1-to-L0A/L0B movement). |
|
Mte2 Ratio |
Percentage of cycles taken to execute MTE2 instructions (DDR-to-AI Core movement). |
|
Mte3 Ratio |
Percentage of cycles taken to execute MTE3 instructions (AI Core-to-DDR movement). |
|
Icache Miss Rate |
I-Cache miss rate. The smaller the value, the higher the performance. |
|
Memory Bound |
AI Core memory bound, calculated as: mte2_ratio/max(mac_ratio, vec_ratio). If the value is less than 1, no memory bound exists. If the value is greater than 1, a memory bound exists. The greater the value is, the severer the bound is. |
|
Sample-based: Arithmetic Utilization |
|
|
Core ID |
AI Core ID. |
|
Mac Fp16_ratio |
Percentage of cycles taken to execute Cube fp16 instructions |
|
Mac Int8 Ratio |
Percentage of cycles taken to execute Cube int8 instructions |
|
Vec Fp32 Ratio |
Percentage of cycles taken to execute Vector fp32 instructions |
|
Vec Fp16 Ratio |
Percentage of cycles taken to execute Vector fp16 instructions |
|
Vec Int32 Ratio |
Percentage of cycles taken to execute Vector int32 instructions |
|
Vec Misc Ratio |
Percentage of cycles taken to execute Vector misc instructions |
|
Cube Fops |
Number of floating-point operations per second for the cube type |
|
Vector Fops |
Number of floating-point operations per second for the vector type |
|
Sample-based: UB/L1/L2/Main Memory Bandwidth |
|
|
Core ID |
AI Core ID. |
|
ub_read_bw(GB/s) |
UB read bandwidth (GB/s) |
|
ub_write_bw(GB/s) |
UB write bandwidth (GB/s) |
|
l1_read_bw(GB/s) |
L1 read bandwidth (GB/s) |
|
l1_write_bw(GB/s) |
L1 write bandwidth (GB/s) |
|
l2_read_bw(GB/s) |
L2 read bandwidth (GB/s) |
|
l2_write_bw(GB/s) |
L2 write bandwidth (GB/s) |
|
main_mem_read_bw(GB/s) |
Main memory read bandwidth (GB/s) |
|
main_mem_write_bw(GB/s) |
Main memory write bandwidth (GB/s) |
|
Sample-based: L0A/L0B/L0C Memory Bandwidth |
|
|
Core ID |
AI Core ID. |
|
l0a_read_bw(GB/s) |
L0a read bandwidth (GB/s) |
|
l0a_write_bw(GB/s) |
L0a write bandwidth (GB/s) |
|
l0b_read_bw(GB/s) |
L0b read bandwidth (GB/s) |
|
l0b_write_bw(GB/s) |
L0b write bandwidth (GB/s) |
|
l0c_read_bw(GB/s) |
Bandwidth rate for Vector to read data from L0C, in GB/s. |
|
l0c_write_bw(GB/s) |
Bandwidth rate for Vector to write data to L0C, in GB/s. |
|
l0c_read_bw_cube(GB/s) |
Bandwidth rate for Cube to read data from L0C, in GB/s. |
|
l0c_write_bw_cube(GB/s) |
Bandwidth rate for Cube to write data to L0C, in GB/s. |
|
Sample-based: UB Memory Bandwidth |
|
|
Core ID |
AI Core ID. |
|
ub_read_bw_vector(GB/s) |
Bandwidth rate for Vector to read data from UB, in GB/s. |
|
ub_write_bw_vector(GB/s) |
Bandwidth rate for Vector to write data to UB, in GB/s. |
|
ub_read_bw_scalar(GB/s) |
Bandwidth rate for Scalar to read data from UB, in GB/s. |
|
ub_write_bw_scalar(GB/s) |
Bandwidth rate for Scalar to write data to UB, in GB/s. |
|
ub_read_bw_mte(GB/s) |
Bandwidth rate for MTE to read data from UB, in GB/s. The Ascend 310 AI Processor supports this function. |
|
ub_write_bw_mte(GB/s) |
Bandwidth rate for MTE to write data to UB, in GB/s. The Ascend 310 AI Processor supports this function. |