Quick Start
Overview
MindStudio contains multiple tools, such as msKPP, msOpGen, msOpST, msSanitizer, msDebug, and msProf. This document uses an example to describe the workflow of using MindStudio.
The sample uses the single-operator API call mode as an example to describe how to use the operator development tools to design an operator, create an operator project, test the operator function, detect operator exceptions, debug the operator, and tune the performance.
Environment Setup
- Prepare a server equipped with the and install the driver and firmware. For details, see Installing the NPU Driver and Firmware.
- Install Ascend-CANN-Toolkit. For details, see Installing the Toolkit Development Kit.
- To use MindStudio Insight, you need to install the MindStudio Insight software package. For details about the download link, see Installation and Uninstallation.
- ${git_clone_path} is the installation path of the sample repository.
- Replace ${INSTALL_DIR} with the actual CANN component directory. If the Ascend-CANN-Toolkit package is installed as the root user, the CANN component directory is /usr/local/Ascend/ascend-toolkit/latest.
- Run the npu-smi info command on the server where the Ascend AI Processor is installed to obtain the Chip Name information. The actual value is AscendChip Name. For example, if Chip Name is xxxyy, the actual value is Ascendxxxyy. If Ascendxxxyy is the code sample path, you need to set ascendxxxyy.
- If you need the instruction proportion pie chart (instruction_cycle_consumption.html), install the third-party Python library plotly as it is a dependency for generating the pie chart.
pip3 install plotly
msKPP (Operator Design)
The msKPP tool is used before operator development. It allows developers to obtain the operator performance modeling result in seconds and quickly verify the operator implementation solution.
- Configure the msKPP tool by referring to Environment Setup.
- Obtain the Python script for operator modeling (the add operator is used as an example).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
from mskpp import vadd, Tensor, Chip def my_vadd(gm_x, gm_y, gm_z): # Basic data path of vector Add: # Addend x: GM-UB # Augend y: GM-UB # Result vector z: UB-GM # Define and allocate variables on the UB. x = Tensor("UB") y = Tensor("UB") z = Tensor("UB") # Move the data on the GM to the memory space corresponding to the UB. x.load(gm_x) y.load(gm_y) # The current data has been loaded to the UB. Call calculation instruction, and save the result to the UB. out = vadd(x, y, z)() # Move the data on the UB to the address space of the GM variable gm_z. gm_z.load(out[0]) if __name__== '__main__': with Chip ("Ascendxxxyy") as chip: # xxxyy indicates the type of the processor actually used by the user. chip.enable_trace() chip.enable_metrics() # Use the operator for AI Core computation. in_x = Tensor("GM", "FP16", [32, 48], format="ND") in_y = Tensor("GM", "FP16", [32, 48], format="ND") in_z = Tensor("GM", "FP16", [32, 48], format="ND") my_vadd(in_x, in_y, in_z)
- Run the Python script in Step 2. The following result directories are generated in the current directory. For details about the file content, see Analyzing Operator Computing and Transfer Specifications, Analyzing Extreme Performance, and Preliminary Design of Operator Tiling.
Table 1 Modeling result files File
Function
Transfer pipeline statistics (Pipe_statistic.csv)
Collects statistics on the amount of transferred data, number of operations, and time consumption by pipeline.
Instruction statistics (Instruction_statistic.csv)
Collects statistics on the total amount of transferred data, number of operations, and time consumption across different instruction dimensions to detect bottlenecks at the instruction layer.
Instruction proportion pie chart (instruction_cycle_consumption.html)
Collects statistics on time consumption by instruction and displays the statistics in a pie chart.
trace.json (instruction pipeline chart)
Displays the time consumption information by instruction in a visual format.
msOpGen (Operator Project Generation)
The msOpGen tool can be used to generate custom operator projects during operator development. This allows users to focus on the core logic and algorithm implementation of operators without spending a lot of time on repetitive work such as project setup, build, and configuration, greatly improving the development efficiency.
- Generate the operator directory.
- Save the .json operator definition file to the working directory. For details about the configuration parameters of the AddCustom.json file, see Table 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
[ { "op": "AddCustom", "language": "cpp", "input_desc": [ { "name": "x", "param_type": "required", "format": [ "ND" ], "type": [ "float16" ] }, { "name": "y", "param_type": "required", "format": [ "ND" ], "type": [ "float16" ] } ], "output_desc": [ { "name": "z", "param_type": "required", "format": [ "ND" ], "type": [ "float16" ] } ] } ]
- Run the following command to generate an operator development project. For details about the parameters, see Table 2.
msopgen gen -i AddCustom.json -f tf -c ai_core-ascendxxxyy -lan cpp -out AddCustom # xxxyy indicates the type of the processor used by the user.
- View the generated directory.
tree -C -L 2 AddCustom/
- The following shows the directory of the operator project generated in the specified directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
AddCustom ├── build.sh ├── cmake ├── CMakeLists.txt ├── CMakePresets.json ├── framework │ ├── CMakeLists.txt │ └── tf_plugin ├── op_host │ ├── add_custom.cpp │ ├── add_custom_tiling.h │ └── CMakeLists.txt ├── op_kernel │ ├── add_custom.cpp │ └── CMakeLists.txt └── scripts ├── help.info ├── install.sh └── upgrade.sh
- Save the .json operator definition file to the working directory. For details about the configuration parameters of the AddCustom.json file, see Table 1.
- Click Link to obtain the code samples for operator kernel function development and Tiling implementation. Run the following command to move the operator implementation files in the sample directory to the directory generated by msOpGen in Step 1:
cp -r ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AddCustom/* AddCustom/
- After an operator project is created, you need to develop an operator by referring to Ascend C Operator Development Guide . This step only describes the functions of the operator development tools. Therefore, the sample code is used.
- When downloading the code sample, run the following command to specify the branch version:
git clone https://gitee.com/ascend/samples.git -b v0.2-8.0.0.beta1
- Build the operator project.
- In the directory where the custom operator package is stored, run the following command to deploy the operator package:
./build_out/custom_opp_<target_os>_<target_architecture>.run
- Verify the operator function and generate the executable file execute_add_op.
- Switch to the directory of the AclNNInvocation repository.
cd ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation - Run the following command:
./run.sh
- Compare the precision and generate the execute_add_op executable file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
INFO: execute op! [INFO] Set device[0] success [INFO] Get RunMode[1] success [INFO] Init resource success [INFO] Set input success [INFO] Copy input[0] success [INFO] Copy input[1] success [INFO] Create stream success [INFO] Execute aclnnAddCustomGetWorkspaceSize success, workspace size 0 [INFO] Execute aclnnAddCustom success [INFO] Synchronize stream success [INFO] Copy output[0] success [INFO] Write output success [INFO] Run op success [INFO] Reset Device success [INFO] Destroy resource success INFO: acl executable run success! error ratio: 0.0000, tolerance: 0.0010 test pass
- Switch to the directory of the AclNNInvocation repository.
msOpST (Operator Test)
The msOpST tool is used to preliminarily test operator functions after operator development. It can be used to analyze and optimize operator performance more efficiently, improving the operator execution efficiency and reducing the development cost.
This sample generates an .om file of a single-operator based on the AscendCL API process and executes the file to verify the operator execution result.
- Generate ST cases.
- After step 2 is complete, run the following commands and replace the command path with the actual msOpGen operator project directory.
msopst create -i "$HOME/AddCustom/op_host/add_custom.cpp" -out ./st
- Generate ST cases.
1 2 3 4 5
2024-09-10 19:47:15 (3995495) - [INFO] Start to parse AscendC operator prototype definition in $HOME/AddCustom/op_host/add_custom.cpp. 2024-09-10 19:47:15 (3995495) - [INFO] Start to check valid for op info. 2024-09-10 19:47:15 (3995495) - [INFO] Finish to check valid for op info. 2024-09-10 19:47:15 (3995495) - [INFO] Generate test case file $HOME/AddCustom/st/AddCustom_case_20240910194715.json successfully. 2024-09-10 19:47:15 (3995495) - [INFO] Process finished!
- ST cases are generated in the ./st directory.
- After step 2 is complete, run the following commands and replace the command path with the actual msOpGen operator project directory.
- Perform ST.
- Set environment variables based on the CANN package path.
export DDK_PATH=${INSTALL_DIR} export NPU_HOST_LIB=${INSTALL_DIR}/{arch-os}/devlib
- Perform ST and save the test result to a specified path.
msopst run -i ./st/AddCustom_case_{TIMESTAMP}.json -soc Ascendxxxyy -out ./st/out # xxxyy indicates the actual processor type.
- Set environment variables based on the CANN package path.
- After the test is successful, the test result is exported to the st.report.json file in the ./st/out/ xxxx/ directory. For details, see .
msSanitizer (Operator Anomaly Detection)
The msSanitizer tool is used throughout the operator development cycle to allow developers to ensure the quality and stability of operators. By detecting and fixing exceptions in the early stage, msSanitizer greatly mitigates potential risks and reduces maintenance costs after product rollout.
- After the tool is started, the tool run log file mssanitizer_{TIMESTAMP}_{PID}.log is automatically generated in the current directory. After the user program is executed, an exception report is displayed.
- ${git_clone_path} is the path of the sample repository.
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch directory to generate a custom operator project and implement the operator on the host and kernel:
bash install.sh -v Ascendxxxyy # xxxyy indicates the processor type.
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory to build and deploy the operator again:
bash build.sh ./build_out/ custom_opp_ <target_os>_<target_architecture> .run // Name of the .runfile in the current directory
- Go to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation directory, start the operator API running script, and check the memory.
- Enable memory check.
- You can run the following command to explicitly specify memory check types. By default, detection of illegal read/write, multi-core corruption, unaligned access, and illegal release is enabled.
mssanitizer --tool=memcheck bash run.sh
- Run the following command to manually enable memory leak detection:
mssanitizer --tool=memcheck --leak-check=yes bash run.sh
- You can run the following command to explicitly specify memory check types. By default, detection of illegal read/write, multi-core corruption, unaligned access, and illegal release is enabled.
- Locate a memory exception. For details, see Memory Exception Report Parsing Operator Development Tool User Guide.
- Enable memory check.
- Perform contention check.
- Run the following commands to enable contention check:
mssanitizer --tool=racecheck bash run.sh
- Locate memory contention. For details, see Analyzing Contention Check Report Operator Development Tool User Guide.
The tool run log file mssanitizer_{TIMESTAMP}_{PID}.log is automatically generated in the current directory. After the program is executed, an exception report is displayed.
- Run the following commands to enable contention check:
- Perform uninitialization check.
msDebug (Operator Debugging)
msDebug can debug all Ascend operators. You can use different functions as required. For example, you can set breakpoints, print variables and memory, perform single-step debugging, stop running, and switch cores.
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch directory to generate a custom operator project and implement the operator on the host and kernel:
bash install.sh -v Ascendxxxyy # xxxyy indicates the processor type.
- In the CMakePresets.json file in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory, change Release to Debug.
1 2 3 4 5
"cacheVariables": { "CMAKE_BUILD_TYPE": { "type": "STRING", "value": "Debug" },
- Run the following commands to build and deploy the operator again:
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory to build and deploy the operator again:
bash build.sh ./build_out/ custom_opp_ <target_os>_<target_architecture> .run // Name of the .runfile in the current directory
- Switch to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation directory and run the following command to generate the executable file execute_add_op in the ./output directory:
bash run.sh cd ./output
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory to build and deploy the operator again:
- Before debugging, configure the following environment variable, specify the operator loading path, and import the debugging information.
export LAUNCH_KERNEL_PATH=${INSTALL_DIR}/opp/vendors/customize/op_impl/ai_core/tbe/kernel/${soc_version}/add_custom/AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o //soc_version indicates the name of Ascend AI Processor.
- Specify the path of the dynamic library on which the operator depends and load the .so file of the dynamic library.
export LD_LIBRARY_PATH=$ASCEND_HOME_PATH/opp/vendors/customize/op_api/lib:$LD_LIBRARY_PATH
- Run the msdebug execute_add_op command in the executable file directory to access msDebug.
msdebug execute_add_op
- Set a breakpoint.
- Set a breakpoint.
(msdebug) b add_custom.cpp:55
- The following command output indicates that the breakpoint is successfully added:
1Breakpoint 1: where = AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`KernelAdd::Compute(int) (.vector) + 68 at add_custom.cpp:55:9, address = 0x00000000000014f4
- Set a breakpoint.
- Enter the r command to run the operator program and wait until the breakpoint is hit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
(msdebug) r Process 1454802 launched: '${INSTALL_DIR}/add_cus/AclNNInvocation/output/execute_add_op' (aarch64) [INFO] Set device[0] success [INFO] Get RunMode[1] success [INFO] Init resource success [INFO] Set input success [INFO] Copy input[0] success [INFO] Copy input[1] success [INFO] Create stream success [INFO] Execute aclnnAddCustomGetWorkspaceSize success, workspace size 0 [Launch of Kernel AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b on Device 0] [INFO] Execute aclnnAddCustom success Process 1454802 stopped [Switching to focus on Kernel AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b, CoreId 39, Type aiv] * thread #1, name = 'execute_add_op', stop reason = breakpoint 1.1 frame #0: 0x00000000000014f4 AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`KernelAdd::Compute(this=0x00000000003078a8, progress=0) (.vector) at add_custom.cpp:55:9 52 __aicore__ inline void Compute(int32_t progress) 53 { 54 LocalTensor<DTYPE_X> xLocal = inQueueX.DeQue<DTYPE_X>(); -> 55 LocalTensor<DTYPE_Y> yLocal = inQueueY.DeQue<DTYPE_Y> (); //Ensure that the line number at the breakpoint is correct. Other information depends on the actual situation. 56 LocalTensor<DTYPE_Z> zLocal = outQueueZ.AllocTensor<DTYPE_Z>(); 57 Add(zLocal, xLocal, yLocal, this->tileLength); 58 outQueueZ.EnQue<DTYPE_Z>(zLocal);
- Keep running.
- Enter the following command to continue the running:
(msdebug) c
- The following shows that the program hits the breakpoint again:
1 2 3 4 5 6 7 8 9 10 11 12
Process 1454802 resuming Process 1454802 stopped [Switching to focus on Kernel AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b, CoreId 39, Type aiv] * thread #1, name = 'execute_add_op', stop reason = breakpoint 1.1 frame #0: 0x00000000000014f4 AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`KernelAdd::Compute(this=0x00000000003078a8, progress=0) (.vector) at add_custom.cpp:55:9 52 __aicore__ inline void Compute(int32_t progress) 53 { 54 LocalTensor<DTYPE_X> xLocal = inQueueX.DeQue<DTYPE_X>(); -> 55 LocalTensor<DTYPE_Y> yLocal = inQueueY.DeQue<DTYPE_Y> (); //Ensure that the line number at the breakpoint is correct. Other information depends on the actual situation. 56 LocalTensor<DTYPE_Z> zLocal = outQueueZ.AllocTensor<DTYPE_Z>(); 57 Add(zLocal, xLocal, yLocal, this->tileLength); 58 outQueueZ.EnQue<DTYPE_Z>(zLocal);
- Enter the following command to continue the running:
- Stop debugging.
(msdebug) q
msProf (Operator Tuning)
The msProf tool is mainly used in the performance optimization phase of operator development. By using the msProf tool, developers can ensure that operators can run efficiently on different hardware platforms, thereby improving the overall software performance and user experience.
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch directory to generate a custom operator project and implement the operator on the host and kernel:
bash install.sh -v Ascendxxxyy # xxxyy indicates the processor type.
- Run the following command in the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/CustomOp directory to build and deploy the operator again:
bash build.sh ./build_out/ custom_opp_ <target_os>_<target_architecture> .run // Name of the .runfile in the current directory
- Switch to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation directory and run the following command to generate an executable file:
./run.sh
- Specify the path of the dynamic library on which the operator depends and load the .so file of the dynamic library.
export LD_LIBRARY_PATH=$ASCEND_HOME_PATH/opp/vendors/customize/op_api/lib:$LD_LIBRARY_PATH
- Use msprof op to perform board-based tuning.
- Go to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/output directory and run the following command to enable board-based tuning:
msprof op --output=./output_data ./execute_add_op
- The following result directories are generated:
1 2 3 4 5 6 7 8 9 10 11
OPPROF_20240911145000_YLKFDJDQNXGDTXPH/ ├── ArithmeticUtilization.csv ├── dump ├── L2Cache.csv ├── Memory.csv ├── MemoryL0.csv ├── MemoryUB.csv ├── OpBasicInfo.csv ├── PipeUtilization.csv ├── ResourceConflictRatio.csv └── visualize_data.bin
- Import the visualize_data.bin file to the MindStudio Insight tool to visualize the board adding result. For details, see msprof op in msprof op .
- Go to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/output directory and run the following command to enable board-based tuning:
- Configure the msprof op simulator by referring to Configurations of msprof op simulator.
- Use the msprof op simulator to perform simulation-based tuning.
- Go to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/output directory and run the following command to enable simulation tuning:
msprof op simulator --soc-version=Ascendxxxyy --output=./output_data ./execute_add_op
- The following result directories are generated:
1 2 3 4 5 6 7 8 9 10 11 12 13
OPPROF_20240911150827_GYCKQHGDUHJFYICF/ ├── dump └── simulator ├── core0.veccore0 ├── core0.veccore1 ├── core1.veccore0 ├── core1.veccore1 ├── core2.veccore0 ├── core2.veccore1 ├── core3.veccore0 ├── core3.veccore1 ├── trace.json └── visualize_data.bin
- Import the trace.json and visualize_data.bin files to the MindStudio Insight tool to visualize the simulation result. For details, see msprof op simulator .
- Go to the ${git_clone_path}/samples/operator/ascendc/0_introduction/1_add_frameworklaunch/AclNNInvocation/output directory and run the following command to enable simulation tuning: