Profile Data Collection

Principles

This section describes the Profiling APIs. Three Profiling methods are provided:

  • Profiling AscendCL API (collecting and flushing profile data)

    Write the collected profile data to a file, use the profiling tool to parse the file (see " Performance Data Parsing and Export" in Performance Tuning Tool User Guide ), and display the profile data.

    The following two API calling modes are available:
    • Mode 1 uses calls to the following APIs: aclprofInit, aclprofStart, aclprofStop, and aclprofFinalize. You can obtain the AscendCL API profile data, time taken to execute AI Core operators, as well as AI Core metrics. Currently, the preceding APIs perform process-level control. That is, if the APIs are called in any thread in the process, the calls also take effect in other threads in the same process.

      These APIs can be called repeatedly in a process, allowing for varied Profiling configurations with each call.

    • Mode 2 uses the aclInit call. During AscendCL initialization, the Profiling configuration is passed as a JSON configuration file. You can obtain the AscendCL API profile data, time taken to execute AI Core operators, as well as AI Core metrics.

      aclInit can be called only once per process. To modify the Profiling configuration, modify the JSON configuration file. For details, see the description of the aclInit API.

  • Profiling AscendCL API for Extension (extension APIs)

    When you need to locate the performance bottleneck of your app or the upper-layer framework program, call the Profiling AscendCL APIs for Extension during the profiling process (between the aclprofStart and aclprofStop calls). The extension APIs together achieve the msproftx function, which is used to record the time span of specific events during app running and write data to a profile data file. You can use the profiling tool to parse the file and export the profile data.

    These APIs can be called for multiple times in a process. The API calling method is as follows: Between aclprofStart and aclprofStop, call aclprofCreateStamp, aclprofPush, aclprofPop, aclprofRangeStart, aclprofRangeStop, and aclprofDestroyStamp. These API calls obtain the events that occur at a specific time during app running and record the event time span.

    For details about how to parse and export data using the profiling tool, see " Performance Data Parsing and Export" in Performance Tuning Tool User Guide .

  • Profiling AscendCL API for Subscription (subscription to operator information)

    Analyze the collected profile data and write it to the pipeline. Then, the user loads the data to the memory and call the AscendCL API to obtain the profile data.

    API calling: aclprofModelSubscribe, aclprofGet*, and aclprofModelUnSubscribe are used together. The profile data of operators in the model can be obtained, including the operator name, operator type name, and operator execution time.

Sample Code for Profiling AscendCL API

Add exception handling branches following the API calls. The following is a code snippet of key steps only, which is not ready to be built or run.

This section describes the code logic for profile data collection. For details about how to initialize and deinitialize AscendCL, see Initializing AscendCL. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation. For details about how to load a model, prepare the input/output data of model inference, and execute and unload the model, see Inference with Single-Batch and Static-Shape Inputs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "...";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Set Profiling configuration.
uint32_t deviceIdList[1] = {0};
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME | ACL_PROF_AICORE_METRICS | ACL_PROF_AICPU | ACL_PROF_L2CACHE | ACL_PROF_HCCL_TRACE | ACL_PROF_MSPROFTX | ACL_PROF_RUNTIME_API);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.
 
//7. Execute your model.
ret = aclmdlExecute(modelId, input, output);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free up memory, and unload the model.

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Deinitialize AscendCL.
// ......

Sample Code for Profiling AscendCL API for Extension

Add exception handling branches following the API calls. The following is a code snippet of key steps only, which is not ready to be built or run.

For details about the profiling msproftx API, see the code highlighted in bold in the following example:

This section describes the code logic for profile data collection. For details about how to initialize and deinitialize AscendCL, see Initializing AscendCL. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation. For details about how to load a model, prepare the input/output data of model inference, and execute and unload the model, see Inference with Single-Batch and Static-Shape Inputs.

Example 1 (aclprofMark):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "...";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Configure Profiling.
uint32_t deviceIdList[1] = {0}; //Set this parameter based on the device ID in the actual environment.
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME | ACL_PROF_MSPROFTX);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

aclprofStepInfo *stepInfo = aclprofCreateStepInfo();
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_START, stream_);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "model_load_mark", strlen("model_load_mark"));
aclprofMark(stamp);      //Mark the model loading event.
aclprofDestroyStamp(stamp);

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//7. Execute your model.
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "model_exec_mark", strlen("model_exec_mark"));
aclprofMark(stamp);      //Mark the model execution event.
aclprofDestroyStamp(stamp);
ret = aclmdlExecute(modelId, input, output);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free memory, and unload the model.
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_END, stream_);
aclprofDestroyStepInfo(stepInfo);

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Deinitialize AscendCL.
//......

Example 2 (aclprofMarkEx, with dotting before and after model execution)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
//The stream here is created by calling aclrtCreateStream.
aclError markRet;
        markRet = aclprofMarkEx("model execute start", strlen("model execute start"), stream_);
        if (markRet != ACL_ERROR_NONE) {
            ERROR_LOG("mark execute start failed");
        }
        ret = processModel.Execute();
        if (ret != SUCCESS) {
            ERROR_LOG("execute inference failed");
            aclrtFree(picDevBuffer);
            return FAILED;
        }
        markRet = aclprofMarkEx("model execute stop", strlen("model execute stop"), stream_);
        if (markRet != ACL_ERROR_NONE) {
            ERROR_LOG("mark execute stop failed");
        }

Example 3 (aclprofPush/aclprofPop, applicable to single-thread scenarios):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "...";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Configure Profiling.
uint32_t deviceIdList[1] = {0};
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

aclprofStepInfo *stepInfo = aclprofCreateStepInfo();
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_START, stream_);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//7. Execute the model (only in a single thread).
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "aclmdlExecute_duration", strlen("aclmdlExecute_duration"));
aclprofPush(stamp);
ret = aclmdlExecute(modelId, input, output);
aclprofPop(stamp);
aclprofDestroyStamp(stamp);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free memory, and unload the model.
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_END, stream_);
aclprofDestroyStepInfo(stepInfo);

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Deinitialize AscendCL.
//......

Example 4 (aclprofRangeStart/aclprofRangeStop, applicable to single-thread or cross-thread scenarios):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "...";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Configure Profiling.
uint32_t deviceIdList[1] = {0};
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

aclprofStepInfo *stepInfo = aclprofCreateStepInfo();
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_START, stream_);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//7. Execute the model (across threads).
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "aclmdlExecute_duration", strlen("aclmdlExecute_duration"));
aclprofRangeStart(stamp, &rangeId);
ret = aclmdlExecute(modelId, input, output);
aclprofRangeStop(rangeId);
aclprofDestroyStamp(stamp);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free memory, and unload the model.
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_END, stream_);
aclprofDestroyStepInfo(stepInfo);

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Deinitialize AscendCL.
//......

Sample Code for Profiling AscendCL API for Subscription

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

This section describes the code logic for profile data collection. For details about how to initialize and deinitialize AscendCL, see Initializing AscendCL. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation. For details about how to load a model, prepare the input/output data of model inference, and execute and unload the model, see Inference with Single-Batch and Static-Shape Inputs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
//1. Initialize AscendCL.

//2. Allocate runtime resources.

//3. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//4. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//5. Create a pipeline (on UNIX, you need the unistd.h header file of the C++ standard library) to read and write the model subscription data.
int subFd[2];
//The read pipeline pointer points to subFd[0], and the write pipeline pointer points to subFd[1].
pipe(subFd);

//6. Create a model subscription configuration and subscribe to the model.
aclprofSubscribeConfig *config = aclprofCreateSubscribeConfig(1, ACL_AICORE_NONE, &subFd[1]);
//Pass modelId of the model to subscribe.
aclprofModelSubscribe(modelId, config);

//7. Enable the pipeline to read subscription data.
//7.1 Customize a function to read subscription data from the user memory.
void getModelInfo(void *data, uint32_t len) {
    uint32_t opNumber = 0;
    uint32_t dataLen = 0;
    //Read the number of operators by using the AscendCL API call.
    aclprofGetOpNum(data, len, &opNumber);
    //Iterate over the operator information in the user memory.
    for (int32_t i = 0; i < opNumber; i++){
        //Obtain the modelId of the operator.
        uint32_t modelId = aclprofGetModelId(data,len, i);
        //Obtain the length of the operator type name.
        size_t opTypeLen = 0;
        aclprofGetOpTypeLen(data, len, i, &opTypeLen);
        //Obtain the operator type name.
        char opType[opTypeLen];
        aclprofGetOpType(data, len, i, opType, opTypeLen);
        //Obtain the length of the operator name.
        size_t opNameLen = 0;
        aclprofGetOpNameLen(data, len, i, &opNameLen);
        //Obtain the operator name.
        char opName[opNameLen];
        aclprofGetOpName(data, len, i, opName, opNameLen);
        //Obtain the execution start time of the operator.
        uint64_t opStart = aclprofGetOpStart(data, len, i);
        //Obtain the execution end time of the operator.
        uint64_t opEnd = aclprofGetOpEnd(data, len, i);
        uint64_t opDuration = aclprofGetOpDuration(data, len, i);
    }
}

//7.2 Customize a function to read data from the pipeline to the user memory.
void *profDataRead(void *fd) {
    //Set the number of operators read from the pipeline each time.
    uint64_t N = 10;
    //Obtain the operator information buffer size (in bytes) per operator.
    uint64_t bufferSize = 0;
    aclprofGetOpDescSize(&bufferSize);
    //Calculate the total operator information buffer size and allocate buffer accordingly.
    uint64_t readbufLen = bufferSize * N;
    char *readbuf = new char[readbufLen];
    //Read data from the pipeline to the allocated memory. The actual size of the read data (dataLen) may be less than bufferSize * N. If there is no data in the pipeline, the process is blocked until data is read.
    auto dataLen = read(*(int*)fd, readbuf, readbufLen);
    //The data is successfully read to the readbuf.
    while (dataLen > 0) {
      //Call the function implemented in 5.1 to parse data in the memory.
        getModelInfo(readbuf, dataLen);
        memset(readbuf, 0, bufferSize);
        dataLen = read(*(int*)fd, readbuf, readbufLen);
    }
    delete []readbuf;
}

//8. Start the thread to read and parse the pipeline data.
pthread_t subTid = 0;
pthread_create(&subTid, NULL, profDataRead, &subFd[0]);

//9. Execute your model.
ret = aclmdlExecute(modelId, input, output);

//10. Process the model inference result.

//11. Destroy the model input and output descriptions, free up memory, and unload the model.

//12. Unsubscribe from the model and destroy the subscription-related resources.
aclprofModelUnSubscribe(modelId);
pthread_join(subTid, NULL);
//Close the read pipeline pointer.
close(subFd[0]);
//Destroy the config pointer.
aclprofDestroySubscribeConfig(config);

//13. Deallocate runtime resources.

//14. Deinitialize AscendCL.
// ......