aclmdlExecuteAsync
Description
Executes a model for inference. This API is asynchronous.
Restrictions
- This API is asynchronous. The API call delivers a task rather than executes a task. After this API is called, you need to call a synchronization API (for example, aclrtSynchronizeStream) to ensure that the task is complete. Otherwise, service exceptions (such as training or inference exception) or unknown situations (such as device link or card disconnection) may occur.
- For models with the same modelId, aclmdlExecuteAsync cannot be called to perform model inference in the multistreaming concurrency scenario. An incorrect example is as follows. In this example, aclmdlExecuteAsync is called twice and multiple streams are concurrently executed. As a result, an error is reported.
//...... aclmdlExecuteAsync(modelId1, input, output, stream1); aclmdlExecuteAsync(modelId1, input, output, stream2); aclrtSynchronizeStream(stream1); aclrtSynchronizeStream(stream2); //......
- If the same modelId is shared by multiple threads due to service requirements, locks must be added between user threads to ensure that operations of refreshing the input and output memories and executing inference are performed continuously. For example:
// API call sequence of thread A: lock(handle1) -> aclrtMemcpyAsync(stream1) refreshes the input and output memories -> aclmdlExecuteAsync(modelId1,stream1) executes inference -> unlock(handle1) // API call sequence of thread B: lock(handle1) -> aclrtMemcpyAsync(stream1) refreshes the input and output memories -> aclmdlExecuteAsync(modelId1,stream1) executes inference -> unlock(handle1)
- The operations of loading, executing, and unloading a model must be performed in the same context. For details about how to create a context, see aclrtSetDevice or aclrtCreateContext.
- You can call acldvppMalloc, hi_mpi_dvpp_malloc, aclrtMalloc, aclrtMallocHost, or aclrtMallocCached to allocate the device memory for storing the model input and output data.
Notes:
- For details about the application scenarios and restrictions of each memory allocation API, see the description of that API.
- The aclrtMallocHost API can allocate device memory only in the following forms:
Ascend RC form
- The acldvppMalloc and hi_mpi_dvpp_malloc APIs are dedicated memory allocation APIs for media data processing. To reduce copy, the output of media data processing is used as the input of model inference to implement memory reuse.
- Hardware has memory alignment and supplement requirements. If you use one of these APIs to allocate a large memory block, and divide and manage the memory, the alignment and supplement restrictions of the corresponding API must be met. For details, see Secondary Memory Allocation.
- If an external Allocator is required, the stream used when registering the Allocator must be consistent with the stream used during model execution.
Prototype
aclError aclmdlExecuteAsync(uint32_t modelId, const aclmdlDataset *input, aclmdlDataset *output, aclrtStream stream)
Parameters
|
Parameter |
Input/Output |
Description |
|---|---|---|
|
modelId |
Input |
Model ID for inference. A successful aclmdlLoadFromFile, aclmdlLoadFromMem, aclmdlLoadFromFileWithMem, or aclmdlLoadFromMemWithMem call returns a model ID. |
|
input |
Input |
Pointer to the input data for model inference. |
|
output |
Output |
Pointer to the output data for model inference. |
|
stream |
Input |
Stream. |
Returns
The value 0 indicates success, and other values indicate failure. For details, see aclError.
Parent topic: Model Execution