Profile Data Collection
Principles
This section describes the Profiling APIs. Three Profiling methods are provided:
- Profiling pyACL APIs (collecting and flushing profile data)
Write the collected profile data to a file, parse the file by using the Profiling tool (see Performance Data Parsing and Export in Performance Tuning Tool User Guide ), and display the profile data.
The following two API call modes are available:- Call the following APIs: acl.prof.init, acl.prof.start, acl.prof.stop, and acl.prof.finalize. You can obtain the pyACL API profile data, time taken to execute AI Core operators, as well as AI Core metrics. Currently, the preceding APIs perform process-level control. That is, if the APIs are called in any thread in the process, the calls also take effect in other threads in the same process.
These APIs can be called repeatedly in a process, allowing for varied Profiling configurations with each call.
- Call acl.init. During pyACL initialization, the Profiling configuration is passed as a JSON configuration file. You can obtain the pyACL API profile data, time taken to execute AI Core operators, as well as AI Core metrics.
acl.init can be called only once per process. To modify the Profiling configuration, modify the JSON configuration file. For details, see the description of the acl.init API.
- Call the following APIs: acl.prof.init, acl.prof.start, acl.prof.stop, and acl.prof.finalize. You can obtain the pyACL API profile data, time taken to execute AI Core operators, as well as AI Core metrics. Currently, the preceding APIs perform process-level control. That is, if the APIs are called in any thread in the process, the calls also take effect in other threads in the same process.
- Profiling pyACL APIs for Extension (extension APIs)
When you need to locate the performance bottleneck of your app or the upper-layer framework program, call the profiling pyACL extension APIs during the profiling process (between the acl.prof.start and acl.prof.stop calls). The extension APIs together achieve the msproftx function, which is used to record the time span of specific events during app running and write data to a profile data file. You can use the Profiling tool to parse the file and export the profile data.
For details about how to parse and export data using the Profiling tool, see Performance Data Parsing and Export in Performance Tuning Tool User Guide .
In a process, these APIs can be called for multiple times as required. API calling: acl.prof.create_stamp, acl.prof.push, acl.prof.pop, acl.prof.range_start, acl.prof.range_stop, and acl.prof.destroy_stamp are called between acl.prof.start and acl.prof.stop. These API calls obtain the events that occur at a specific time during app running and record the event time span.
In a process, these APIs can be called for multiple times as required.
- Profiling pyACL APIs for Subscription (subscribing to operator information)
Analyze the collected profile data and write it to the pipeline. Then, the user loads the data to the memory and call the pyACL API to obtain the profile data.
API calling: acl.prof.model_subscribe, acl.prof.get*, and acl.prof.model_unsubscribe. The profile data of operators in the model can be obtained, including the operator name, operator type name, and operator execution time.
Sample Code for Profiling pyACL APIs
Add an exception handling branch following the API calls. The following is a code snippet of key steps only, which is not ready to use.
For details about the allocation and deallocation of runtime resources, see Runtime Resource Allocation and Runtime Resource Deallocation. For details about the API call sequence for model loading, see API Call Sequence. For details about the API call sequence for model inference and input/output data preparation, see Preparing Input/Output Data Structure for Model Execution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
import acl import numpy as np # ...... # 1. Allocate runtime resources. # ...... # 2. Load a model. After the model is successfully loaded, model_id that identifies the model is returned. # ...... # 3. Create data of type aclmdlDataset to describe the inputs and outputs of the model. # ...... # 4. Initialize Profiling. # Set the data flush path. PROF_INIT_PATH='...' ret = acl.prof.init(PROF_INIT_PATH) # 5. Set Profiling configurations. device_list = [0] ACL_PROF_ACL_API = 0x0001 ACL_PROF_TASK_TIME = 0x0002 ACL_PROF_AICORE_METRICS = 0x0004 ACL_PROF_AICPU_TRACE = 0x0008 ACL_PROF_SYS_HARDWARE_MEM_FREQ = 3 # Create the pointer address of the configuration type. prof_config = acl.prof.create_config(device_list, 0, 0, ACL_PROF_ACL_API | ACL_PROF_TASK_TIME | ACL_PROF_AICPU | ACL_PROF_AICORE_METRICS | ACL_PROF_L2CACHE | ACL_PROF_HCCL_TRACE) mem_freq = "15" ret = acl.prof.set_config(ACL_PROF_SYS_HARDWARE_MEM_FREQ, mem_freq) ret = acl.prof.start(prof_config) # 6. Execute the model. ret = acl.mdl.execute(model_id, input, output) # 7. Process the model inference result. # ...... # 8. Destroy allocations such as the model inputs and outputs, free memory, and unload the model. # ...... # 9. Stop Profiling and destroy the configuration and related resources. ret = acl.prof.stop(prof_config) ret = acl.prof.destroy_config(prof_config) ret = acl.prof.finalize() # 10. Destroy runtime allocations. # ...... |
Sample Code for Profiling pyACL API for Extension
Add an exception handling branch following the API calls. The following is a code snippet of key steps only, which is not ready to use.
For details about the allocation and deallocation of runtime resources, see Runtime Resource Allocation and Runtime Resource Deallocation. For details about the API call sequence for model loading, see API Call Sequence. For details about the API call sequence for model inference and input/output data preparation, see Preparing Input/Output Data Structure for Model Execution.
Example 1 (acl.prof.mark):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# Collection items mac_fp16_ratio, mac_int8_ratio, vec_fp32_ratio, and vec_fp16_ratio. # vec_int32_ratio, vec_misc_ratio ACL_AICORE_ARITHMETIC_UTILIZATION = 0 # profiling data type config ACL_PROF_ACL_API = 0x00000001 ACL_PROF_TASK_TIME = 0x00000002 ACL_PROF_MSPROFTX = 0x00000080 # profiling config type ACL_PROF_SYS_HARDWARE_MEM_FREQ = 3 # Initialize AscendCL. # Allocate runtime resources. # Initialize profiling and set the data flush path. prof_path = "..." ret = acl.prof.init(PROF_INIT_PATH) assert ret == 0 device_list = [0] prof_config = acl.prof.create_config(device_list, ACL_AICORE_ARITHMETIC_UTILIZATION, 0, ACL_PROF_ACL_API | ACL_PROF_TASK_TIME | ACL_PROF_MSPROFTX) assert prof_config != 0 mem_freq = "15" ret = acl.prof.set_config(ACL_PROF_SYS_HARDWARE_MEM_FREQ, mem_freq) self.assertEqual(ret, 0) ret = acl.prof.start(prof_config) assert ret == 0 # Load a model. After the model is successfully loaded, model_id that identifies the model is returned. stamp = acl.prof.create_stamp() assert stamp != 0 load_msg = "model_load_mark" ret = acl.prof.set_stamp_trace_message(stamp, load_msg, len(load_msg)) assert ret == 0 ret = acl.prof.mark(stamp) # Mark the model loading event. assert ret == 0 acl.prof.destroy_stamp(stamp) # Create data of type aclmdlDataset to describe the inputs and outputs of the model. # Execute the model. stamp = acl.prof.create_stamp() assert stamp != 0 exec_msg = "model_exec_mark" ret = acl.prof.set_stamp_trace_message(stamp, exec_msg, len(exec_msg)) assert ret == 0 ret = acl.prof.mark(stamp) # Mark the model execution event. assert ret == 0 acl.prof.destroy_stamp(stamp) ret = acl.mdl.execute(model_id, dataset_input, dataset_output) assert ret == 0 ret = acl.prof.stop(prof_config) assert ret == 0 ret = acl.prof.finalize() assert ret == 0 ret = acl.prof.destroy_config(prof_config) assert ret == 0 # Deallocate runtime resources. # Deinitialize AscendCL. |
Example 2 (acl.prof.mark_ex, with dotting before and after model execution)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# Collection items mac_fp16_ratio, mac_int8_ratio, vec_fp32_ratio, and vec_fp16_ratio. # vec_int32_ratio, vec_misc_ratio ACL_AICORE_ARITHMETIC_UTILIZATION = 0 # Collect the profile data output by the user and upper-layer framework program. ACL_PROF_MSPROFTX = 0x00000080 # Initialize AscendCL. # Allocate runtime resources. prof_path = "..." ret = acl.prof.init(PROF_INIT_PATH) assert ret == 0 device_list = [0] prof_config = acl.prof.create_config(device_list, ACL_AICORE_ARITHMETIC_UTILIZATION, 0, ACL_PROF_MSPROFTX) assert prof_config != 0 ret = acl.prof.start(prof_config) assert ret == 0 ret = acl.prof.mark_ex("model execute start", stream) assert ret == 0 # Execute the model. ret = acl.prof.mark_ex("model execute stop", stream) assert ret == 0 ret = acl.prof.stop(prof_config) assert ret == 0 ret = acl.prof.finalize() assert ret == 0 ret = acl.prof.destroy_config(prof_config) assert ret == 0 # Deallocate runtime resources. # Deinitialize AscendCL. |
Example 3 (acl.prof.push/acl.prof.pop, applicable to single-thread scenarios):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# Collection items mac_fp16_ratio, mac_int8_ratio, vec_fp32_ratio, and vec_fp16_ratio. # vec_int32_ratio, vec_misc_ratio ACL_AICORE_ARITHMETIC_UTILIZATION = 0 # profiling data type config ACL_PROF_ACL_API = 0x00000001 ACL_PROF_TASK_TIME = 0x00000002 # profiling config type ACL_PROF_SYS_HARDWARE_MEM_FREQ = 3 # Initialize AscendCL. # Allocate runtime resources. # Initialize profiling and set the data flush path. prof_path = "..." ret = acl.prof.init(PROF_INIT_PATH) assert ret == 0 device_list = [0] prof_config = acl.prof.create_config(device_list, ACL_AICORE_ARITHMETIC_UTILIZATION, 0, ACL_PROF_ACL_API | ACL_PROF_TASK_TIME) assert prof_config != 0 mem_freq = "15" ret = acl.prof.set_config(ACL_PROF_SYS_HARDWARE_MEM_FREQ, mem_freq) self.assertEqual(ret, 0) ret = acl.prof.start(prof_config) assert ret == 0 # Load a model. After the model is successfully loaded, modelId that identifies the model is returned. # Create data of type aclmdlDataset to describe the inputs and outputs of the model. # Execute the model. (The model is executed only in a single thread.) stamp = acl.prof.create_stamp() assert stamp != 0 exec_msg = "acl.mdl.execute_duration" ret = acl.prof.set_stamp_trace_message(stamp, exec_msg, len(exec_msg)) assert ret == 0 ret = acl.prof.push(stamp) assert ret == 0 ret = acl.mdl.execute(model_id, dataset_input, dataset_output) assert ret == 0 ret = acl.prof.pop(stamp) assert ret == 0 acl.prof.destroy_stamp(stamp) # Process the model inference result. ret = acl.prof.stop(prof_config) assert ret == 0 ret = acl.prof.finalize() assert ret == 0 ret = acl.prof.destroy_config(prof_config) assert ret == 0 # Deallocate runtime resources. # Deinitialize AscendCL. |
Example 4 (acl.prof.range_start/acl.prof.range_stop, applicable to single-thread or cross-thread scenarios):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# Collection items mac_fp16_ratio, mac_int8_ratio, vec_fp32_ratio, and vec_fp16_ratio. # vec_int32_ratio, vec_misc_ratio ACL_AICORE_ARITHMETIC_UTILIZATION = 0 # profiling data type config ACL_PROF_ACL_API = 0x00000001 ACL_PROF_TASK_TIME = 0x00000002 # profiling config type ACL_PROF_SYS_HARDWARE_MEM_FREQ = 3 # Initialize AscendCL. # Allocate runtime resources. # Initialize profiling and set the data flush path. prof_path = "..." ret = acl.prof.init(PROF_INIT_PATH) assert ret == 0 device_list = [0] prof_config = acl.prof.create_config(device_list, ACL_AICORE_ARITHMETIC_UTILIZATION, 0, ACL_PROF_ACL_API | ACL_PROF_TASK_TIME) assert prof_config != 0 mem_freq = "15" ret = acl.prof.set_config(ACL_PROF_SYS_HARDWARE_MEM_FREQ, mem_freq) self.assertEqual(ret, 0) ret = acl.prof.start(prof_config) assert ret == 0 # Load a model. After the model is successfully loaded, modelId that identifies the model is returned. # Create data of type aclmdlDataset to describe the inputs and outputs of the model. # Execute the model (the model is executed across threads). stamp = acl.prof.create_stamp() assert stamp != 0 exec_msg = "acl.mdl.execute_duration" ret = acl.prof.set_stamp_trace_message(stamp, exec_msg, len(exec_msg)) assert ret == 0 range_id, ret = acl.prof.range_start(stamp) assert ret == 0 ret = acl.mdl.execute(model_id, dataset_input, dataset_output) assert ret == 0 ret = acl.prof.range_stop(range_id) assert ret == 0 acl.prof.destroy_stamp(stamp) # Process the model inference result. ret = acl.prof.stop(prof_config) assert ret == 0 ret = acl.prof.finalize() assert ret == 0 ret = acl.prof.destroy_config(prof_config) assert ret == 0 # Deallocate runtime resources. # Deinitialize AscendCL. |
Sample Code for Profiling pyACL API for Subscription
Add an exception handling branch following the API calls. The following is a code snippet of key steps only, which is not ready to use.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
import acl import numpy as np # ...... # 1. Allocate runtime resources. # ...... # 2. Load a model. After the model is successfully loaded, model_id that identifies the model is returned. # ...... # 3. Create data of type aclmdlDataset to describe the inputs and outputs of the model. # ...... # 4. Create a pipeline to read and write the model subscription data. r, w = os.pipe() # 5. Create a model subscription configuration and subscribe to the model. ACL_AICORE_NONE = 0xFF subscribe_config = acl.prof.create_subscribe_config(1, ACL_AICORE_NONE, w) # Pass model_id of the model for subscription. ret = acl.prof.model_subscribe(model_id, subscribe_config) # 6. Enable the pipeline to read subscription data. # 6.1 Customize a function to read subscription data from the user memory. def get_model_info(data, data_len): # Obtain the number of operators. op_number, ret = acl.prof.get_op_num(data, data_len) # Iterate over the operator information in the user memory. for i in range(op_number): # Obtain the model ID of the operator. model_id = acl.prof.get_model_id(data, data_len, i) # Obtain the operator type. op_type, ret = acl.prof.get_op_type(data, data_len, i, 65) # Obtain the operator name. op_name, ret = acl.prof.get_op_name(data, data_len, i, 275) # Obtain the execution start time of the operator. op_start = acl.prof.get_op_start(data, data_len, i) # Obtain the execution end time of the operator. op_end = acl.prof.get_op_end(data, data_len, i) # Obtain the time required for executing the operator. op_duration = acl.prof.get_op_duration(data, data_len, i) # 6.2 Customize a function to read data from the pipeline to the user memory. def prof_data_read(args): fd, ctx = args ret = acl.rt.set_context(ctx) # Obtain the operator information buffer size (in bytes) per operator. buffer_size, ret = acl.prof.get_op_desc_size() # Set the number of operators read from the pipe each time. N = 10 # Calculate the total operator information buffer size. data_len = buffer_size * N # Read data from the pipeline to the allocated memory. The actual size of the read data may be less than buffer_size * N. If there is no data in the pipeline, the process is blocked until data is read. while True: data = os.read(fd, data_len) if len(data) == 0: break np_data = np.array(data) bytes_data = np_data.tobytes() np_data_ptr = acl.util.bytes_to_ptr(bytes_data) size = np_data.itemsize * np_data.size # Call the function implemented in 6.1 to parse data in the memory. get_model_info(np_data_ptr, size) # 7. Start the thread to read and parse the pipeline data. thr_id, ret = acl.util.start_thread(prof_data_read, [r, context]) # 8. Execute the model. ret = acl.mdl.execute(model_id, input, output) # 9. Process the model inference result. # ...... # 10. Destroy allocations such as the model inputs and outputs, free memory, and unload the model. # ...... # 11. Unsubscribe from the model and destroy the subscription-related resources. ret = acl.prof.model_unsubscribe(model_id) ret = acl.util.stop_thread(thr_id) os.close(r) ret = acl.prof.destroy_subscribe_config(subscribe_config) # 12. Destroy runtime allocations. # ...... |