aclmdlExecuteAsync

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference products	√
Atlas training products	√

Description

Runs model inference. This API is asynchronous.

Prototype

aclError aclmdlExecuteAsync(uint32_t modelId, const aclmdlDataset *input, aclmdlDataset *output, aclrtStream stream)

Parameters

Parameter	Input/Output	Description
modelId	Input	Model ID for inference. After a model loading API (such as aclmdlLoadFromFile and aclmdlLoadFromMem) is successfully called, a model ID is returned. The ID is used as the input of this API.
input	Input	Pointer to the input data for model inference.
output	Output	Pointer to the output data for model inference.
stream	Input	Stream.

Returns

0 on success; else, failure. For details, see aclError.

Restrictions

For models with the same modelId, aclmdlExecuteAsync cannot be called to perform model inference in the multistreaming concurrency scenario. An incorrect example is shown below. In this example, aclmdlExecuteAsync is called twice and multiple streams are concurrently executed. As a result, an error is reported.
```
//......
aclmdlExecuteAsync(modelId1, input, output, stream1);
aclmdlExecuteAsync(modelId1, input, output, stream2);
aclrtSynchronizeStream(stream1);
aclrtSynchronizeStream(stream2);
//......
```

If the same modelId is shared by multiple threads due to service requirements, locks must be added between user threads to ensure that operations of refreshing the input and output memories and executing inference are performed continuously. For example:

// API call sequence of thread A:
lock(handle1) -> aclrtMemcpyAsync(stream1) refreshes the input and output memories -> aclmdlExecuteAsync(modelId1,stream1) executes inference -> unlock(handle1)

// API call sequence of thread B:
lock(handle1) -> aclrtMemcpyAsync(stream1) refreshes the input and output memories -> aclmdlExecuteAsync(modelId1,stream1) executes inference -> unlock(handle1)

If an external Allocator is required, the stream used when registering the Allocator must be consistent with the stream used during model execution.
The memory for storing model input and output data is the device memory. You can call APIs such as aclrtMalloc and hi_mpi_dvpp_malloc to allocate device memory.
- For details about the scenarios and restrictions of each memory allocation API, see the API description in Memory Management.
- The hi_mpi_dvpp_malloc API is a dedicated memory allocation API for media data processing. To reduce copy, the output of media data processing is used as the input of model inference to implement memory reuse.
- Hardware has memory alignment and supplement requirements. If you use one of these APIs to allocate a large memory block, and divide and manage the memory, the alignment and supplement restrictions of the corresponding API must be met. For details, see Secondary Memory Allocation.