Profile Data Collection

Principles

This section describes the Profiling APIs. Three Profiling methods are provided:

  • Collecting and flushing profile data

    Write the collected profile data to a file, use the profiling tool to parse the file (see "Offline Parsing" in Profiling Instructions), and display the profile data.

    The following two API calling modes are available:
    • Mode 1 uses calls to the following APIs: aclprofInit, aclprofStart, aclprofStop, and aclprofFinalize. You can obtain the time taken to execute AI Core operators, AI Core metrics, and other information. Currently, the preceding APIs perform process-level control. That is, if the APIs are called in any thread in the process, the calls also take effect in other threads in the same process.

      These APIs can be called repeatedly in a process, allowing for varied Profiling configurations with each call.

    • Mode 2 uses the aclInit call. During initialization, the Profiling configuration is passed as a JSON configuration file. You can obtain the time taken to execute AI Core operators, AI Core metrics, and other information.

      aclInit can be called only once per process. To modify the Profiling configuration, modify the JSON configuration file. For details, see the description of the aclInit API.

  • Using msproftx extension APIs to collect and flush profile data

    When you need to locate the performance bottleneck of your app or the upper-layer framework program, call msproftx extension APIs during the profiling process (between the aclprofStart and aclprofStop calls). msproftx is used to record the time span of specific events during app running and write data to a profile data file. You can use the profiling tool to parse the file and export the profile data.

    These APIs can be called for multiple times in a process. The API calling method is as follows: Between aclprofStart and aclprofStop, call aclprofCreateStamp, aclprofPush, aclprofPop, aclprofRangeStart, aclprofRangeStop, and aclprofDestroyStamp. These API calls obtain the events that occur at a specific time during app running and record the event time span.

    For details about how to parse and export data using the profiling tool, see "Offline Parsing" in Profiling Instructions.

  • Subscription to operator information

    Analyze the collected profile data and write it to the pipeline. Then, the user loads the data to the memory and call the API to obtain the profile data.

    API calling: aclprofModelSubscribe, aclprofGet*, and aclprofModelUnSubscribe are used together. The profile data of operators in the model can be obtained, including the operator name, operator type name, and operator execution time.

Collecting and Flushing Profile Data

Add exception handling branches following the API calls. The following is a code snippet of key steps only, which is not ready to be built or run.

This section describes the code logic for profile data collection. For details about initialization and deinitialization, see Initialization and Deinitialization. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation. For details about how to load a model, prepare the input/output data of model inference, and execute and unload a model, see Model Inference with Static-Shape Inputs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
//1. Perform initialization.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "./output";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Set Profiling configuration.
uint32_t deviceIdList[1] = {0};
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME | ACL_PROF_AICORE_METRICS | ACL_PROF_AICPU | ACL_PROF_L2CACHE | ACL_PROF_HCCL_TRACE | ACL_PROF_MSPROFTX | ACL_PROF_RUNTIME_API);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.
 
//7. Execute your model.
ret = aclmdlExecute(modelId, input, output);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free up memory, and unload the model.

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Perform deinitialization.
// ......

Using msproftx Extension APIs to Collect and Flush Profile Data

Add exception handling branches following the API calls. The following is a code snippet of key steps only, which is not ready to be built or run.

For details about the profiling msproftx API, see the code highlighted in bold in the following example:

This section describes the code logic for profile data collection. For details about initialization and deinitialization, see Initialization and Deinitialization. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation. For details about how to load a model, prepare the input/output data of model inference, and execute and unload a model, see Model Inference with Static-Shape Inputs.

Example 1 (aclprofMark):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
//1. Perform initialization.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "./output";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Configure Profiling.
uint32_t deviceIdList[1] = {0}; //Set this parameter based on the device ID in the actual environment.
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME | ACL_PROF_MSPROFTX);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

aclprofStepInfo *stepInfo = aclprofCreateStepInfo();
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_START, stream_);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "model_load_mark", strlen("model_load_mark"));
aclprofMark(stamp);      //Mark the model loading event.
aclprofDestroyStamp(stamp);

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//7. Execute your model.
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "model_exec_mark", strlen("model_exec_mark"));
aclprofMark(stamp);      //Mark the model execution event.
aclprofDestroyStamp(stamp);
ret = aclmdlExecute(modelId, input, output);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free memory, and unload the model.
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_END, stream_);
aclprofDestroyStepInfo(stepInfo);

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Perform deinitialization.
//......

Example 2 (aclprofMarkEx, with logging before and after model execution)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
//The stream here is created by calling aclrtCreateStream.
aclError markRet;
markRet = aclprofMarkEx("model execute start", strlen("model execute start"), stream_);
if (markRet != ACL_ERROR_NONE) {
    ERROR_LOG("mark execute start failed");
}
ret = processModel.Execute();
if (ret != SUCCESS) {
    ERROR_LOG("execute inference failed");
    aclrtFree(picDevBuffer);
    return FAILED;
}
markRet = aclprofMarkEx("model execute stop", strlen("model execute stop"), stream_);
if (markRet != ACL_ERROR_NONE) {
    ERROR_LOG("mark execute stop failed");
}

Example 3 (aclprofPush/aclprofPop, applicable to single-thread scenarios):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//1. Perform initialization.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "./output";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Configure Profiling.
uint32_t deviceIdList[1] = {0};
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

aclprofStepInfo *stepInfo = aclprofCreateStepInfo();
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_START, stream_);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//7. Execute the model (only in a single thread).
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "aclmdlExecute_duration", strlen("aclmdlExecute_duration"));
aclprofPush(stamp);
ret = aclmdlExecute(modelId, input, output);
aclprofPop(stamp);
aclprofDestroyStamp(stamp);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free memory, and unload the model.
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_END, stream_);
aclprofDestroyStepInfo(stepInfo);

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Perform deinitialization.
//......

Example 4 (aclprofRangeStart/aclprofRangeStop, applicable to single-thread or cross-thread scenarios):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
//1. Perform initialization.

//2. Allocate runtime resources.

//3. Initialize Profiling.
//Set the data flush path.
const char *aclProfPath = "./output";
aclprofInit(aclProfPath, strlen(aclProfPath));

//4. Configure Profiling.
uint32_t deviceIdList[1] = {0};
//Create a configuration struct.
aclprofConfig *config = aclprofCreateConfig(deviceIdList, 1, ACL_AICORE_ARITHMETIC_UTILIZATION, 
    nullptr,ACL_PROF_ACL_API | ACL_PROF_TASK_TIME);
const char *memFreq = "15";
ret = aclprofSetConfig(ACL_PROF_SYS_HARDWARE_MEM_FREQ, memFreq, strlen(memFreq));
aclprofStart(config);

aclprofStepInfo *stepInfo = aclprofCreateStepInfo();
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_START, stream_);

//5. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//6. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//7. Execute the model (across threads).
stamp = aclprofCreateStamp();
aclprofSetStampTraceMessage(stamp, "aclmdlExecute_duration", strlen("aclmdlExecute_duration"));
aclprofRangeStart(stamp, &rangeId);
ret = aclmdlExecute(modelId, input, output);
aclprofRangeStop(rangeId);
aclprofDestroyStamp(stamp);

//8. Process the model inference result.

//9. Destroy the model input and output descriptions, free memory, and unload the model.
int ret = aclprofGetStepTimestamp(stepInfo, ACL_STEP_END, stream_);
aclprofDestroyStepInfo(stepInfo);

//10. Stop Profiling and destroy the configuration and related resources.
aclprofStop(config);
aclprofDestroyConfig(config);
aclprofFinalize();

//11. Deallocate runtime resources.

//12. Perform deinitialization.
//......

Subscription to Operator Information

After APIs are called, you need to add exception handling branches and record error logs and info logs. The following is a code snippet of key steps only, which is not ready to be built or run.

This section describes the code logic for profile data collection. For details about initialization and deinitialization, see Initialization and Deinitialization. For details about how to allocate and deallocate runtime resources, see Runtime Resource Allocation and Deallocation. For details about how to load a model, prepare the input/output data of model inference, and execute and unload a model, see Model Inference with Static-Shape Inputs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
//1. Perform initialization.

//2. Allocate runtime resources.

//3. Load your model. After the model is successfully loaded, modelId that identifies the model is returned.

//4. Create data of type aclmdlDataset to describe the inputs and outputs of your model.

//5. Create a pipeline (on UNIX, you need the unistd.h header file of the C++ standard library) to read and write the model subscription data.
int subFd[2];
//The read pipeline pointer points to subFd[0], and the write pipeline pointer points to subFd[1].
pipe(subFd);

//6. Create a model subscription configuration and subscribe to the model.
aclprofSubscribeConfig *config = aclprofCreateSubscribeConfig(1, ACL_AICORE_NONE, &subFd[1]);
//Pass modelId of the model to subscribe.
aclprofModelSubscribe(modelId, config);

//7. Enable the pipeline to read subscription data.
//7.1 Customize a function to read subscription data from the user memory.
void getModelInfo(void *data, uint32_t len) {
    uint32_t opNumber = 0;
    uint32_t dataLen = 0;
    //Read the number of operators.
    aclprofGetOpNum(data, len, &opNumber);
    //Iterate over the operator information in the user memory.
    for (int32_t i = 0; i < opNumber; i++){
        //Obtain the modelId of the operator.
        uint32_t modelId = aclprofGetModelId(data,len, i);
        //Obtain the length of the operator type name.
        size_t opTypeLen = 0;
        aclprofGetOpTypeLen(data, len, i, &opTypeLen);
        //Obtain the operator type name.
        char opType[opTypeLen];
        aclprofGetOpType(data, len, i, opType, opTypeLen);
        //Obtain the length of the operator name.
        size_t opNameLen = 0;
        aclprofGetOpNameLen(data, len, i, &opNameLen);
        //Obtain the operator name.
        char opName[opNameLen];
        aclprofGetOpName(data, len, i, opName, opNameLen);
        //Obtain the execution start time of the operator.
        uint64_t opStart = aclprofGetOpStart(data, len, i);
        //Obtain the execution end time of the operator.
        uint64_t opEnd = aclprofGetOpEnd(data, len, i);
        uint64_t opDuration = aclprofGetOpDuration(data, len, i);
    }
}

//7.2 Customize a function to read data from the pipeline to the user memory.
void *profDataRead(void *fd) {
    //Set the number of operators read from the pipeline each time.
    uint64_t N = 10;
    //Obtain the operator information buffer size (in bytes) per operator.
    uint64_t bufferSize = 0;
    aclprofGetOpDescSize(&bufferSize);
    //Calculate the total operator information buffer size and allocate buffer accordingly.
    uint64_t readbufLen = bufferSize * N;
    char *readbuf = new char[readbufLen];
    //Read data from the pipeline to the allocated memory. The actual size of the read data (dataLen) may be less than bufferSize * N. If there is no data in the pipeline, the process is blocked until data is read.
    auto dataLen = read(*(int*)fd, readbuf, readbufLen);
    //The data is successfully read to the readbuf.
    while (dataLen > 0) {
      //Call the function implemented in 5.1 to parse data in the memory.
        getModelInfo(readbuf, dataLen);
        memset(readbuf, 0, bufferSize);
        dataLen = read(*(int*)fd, readbuf, readbufLen);
    }
    delete []readbuf;
}

//8. Start the thread to read and parse the pipeline data.
pthread_t subTid = 0;
pthread_create(&subTid, NULL, profDataRead, &subFd[0]);

//9. Execute your model.
ret = aclmdlExecute(modelId, input, output);

//10. Process the model inference result.

//11. Destroy the model input and output descriptions, free up memory, and unload the model.

//12. Unsubscribe from the model and destroy the subscription-related resources.
aclprofModelUnSubscribe(modelId);
pthread_join(subTid, NULL);
//Close the read pipeline pointer.
close(subFd[0]);
//Destroy the config pointer.
aclprofDestroySubscribeConfig(config);

//13. Deallocate runtime resources.

//14. Perform deinitialization.
// ......