msKPP (Operator Design)
The msKPP tool is used before operator development. It allows developers to obtain the operator performance modeling result in seconds and quickly verify the operator implementation solution.
- Configure the msKPP tool by referring to Environment Setup.
- Use msKPP APIs to perform instruction-level operator modeling and simulate the add operator implemented in AscendC. An example is provided as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
# Import the msKPP APIs required for add operator modeling. from mskpp import vadd, Tensor, Chip # Model the process of Data-in -> Computation -> Data-out based on the instruction implementation of the add operator in AI Core. def my_vadd(gm_x, gm_y, gm_z): # Basic data path of vector Add: # Augend x: GM-UB # Addend y: GM-UB # Result vector z: UB-GM # Define and allocate variables on the UB. x = Tensor("UB") y = Tensor("UB") z = Tensor("UB") # Move the data on the GM to the memory space corresponding to the UB. x.load(gm_x) y.load(gm_y) # The current data has been loaded to the UB. Call calculation instruction, and save the result to the UB. out = vadd(x, y, z)() # Move the data on the UB to the address space of the GM variable gm_z. gm_z.load(out[0]) if __name__== '__main__': with Chip("Ascendxxxyy") as chip: # xxxyy indicates the type of the processor used by the user. You can run the npu-smi info command to query the processor type. chip.enable_trace() chip.enable_metrics() # Use the operator for AI Core computation. in_x = Tensor("GM", "FP16", [32, 48], format="ND") in_y = Tensor("GM", "FP16", [32, 48], format="ND") in_z = Tensor("GM", "FP16", [32, 48], format="ND") my_vadd(in_x, in_y, in_z)
- Run the python3 xxx.py command to execute the Python .py script in Step 2. The following result directories are generated in the current directory. For details about the file content, see Analyzing Operator Computing and Transfer Specifications, Analyzing Extreme Performance, and Preliminary Design of Operator Tiling.
1 2 3 4 5
MSKPP{timestamp}/ ├── instruction_cycle_consumption.html ├── Instruction_statistic.csv ├── Pipe_statistic.csv └── trace.json
Table 1 Modeling result files File Name
Function
Pipe_statistic.csv (transfer pipeline statistics)
Collects statistics on the amount of transferred data, number of operations, and time consumption by pipeline.
Instruction_statistic.csv (instruction statistics)
Collects statistics on the total amount of transferred data, number of operations, and time consumption across different instruction dimensions to detect bottlenecks at the instruction layer.
instruction_cycle_consumption.html (instruction proportion pie chart)
Collects statistics on time consumption by instruction and displays the statistics in a pie chart.
trace.json (instruction pipeline chart)
Displays the time consumption information by instruction in a visual format.
