Function: execute_async

Applicability

Product	Supported (√/x)
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas training products	√
Atlas inference products	√
Atlas 200I/500 A2 inference products	√

Function Usage

Executes model inference. It is an asynchronous interface.

Prototype

C Prototype

        
             aclError aclmdlExecuteAsync(uint32_t modelId, const aclmdlDataset *input, aclmdlDataset *output, aclrtStream stream)

Python Function

        
             ret  = acl.mdl.execute_async(model_id, input, output, stream)

Parameter Description

Parameter	Description
model_id	Int, ID of the model to be executed for inference You can obtain the model ID after the model is successfully loaded by calling the following APIs: acl.mdl.load_from_file acl.mdl.load_from_mem acl.mdl.load_from_file_with_mem acl.mdl.load_from_mem_with_mem
input	Int, pointer address of the input data for model inference. For details, see aclmdlDataset.
output	Int, pointer address of the output data of model inference. For details, see aclmdlDataset.
stream	Int, pointer address of the created stream. To specify a new stream, you can create and obtain the pointer address of the stream by calling acl.rt.create_stream.

Return Value Description

Return Value	Description
ret	Int, error code: 0 on success; else, failure.

Restrictions

This API is asynchronous. The API call delivers a task rather than executes a task. After this API is called, call the synchronization API (for example, acl.rt.synchronize_stream) to ensure that the task is complete.
For models with the same model_id, acl.mdl.execute_async cannot be called to perform model inference in the multi-stream concurrency scenario.

If the same model_id is shared by multiple threads due to service requirements, locks must be added between user threads to ensure that operations of refreshing the input and output memory and executing inference are performed continuously. For example:

// API call sequence of thread A:
lock(handle1) -> acl.rt.memcpy_async(stream1) (Refresh the input and output memory) - > acl.mdl.execute_async(modelId1,stream1) (Execute inference) - > unlock(handle1)

// API call sequence of thread B:
lock(handle1) -> acl.rt.memcpy_async(stream1) (Refresh the input and output memory) - > acl.mdl.execute_async(modelId1,stream1) (Execute inference) - > unlock(handle1)

The operations of loading, executing, and unloading a model must be performed in the same context. For details about how to create a context, see acl.rt.create_context.
You can call acl.rt.malloc, acl.rt.malloc_host, or acl.rt.malloc_cached acl.media.dvpp_malloc or acl.himpi.dvpp_malloc to allocate the memory for storing the model input and output data.
The acl.media.dvpp_malloc and acl.himpi.dvpp_malloc APIs are dedicated memory allocation APIs for media data processing. To reduce copy, the output of media data processing is used as the input of model inference to implement memory overcommitment.

Hardware has memory alignment and supplement requirements. If you use one of these APIs to allocate a large memory block, and divide and manage the memory, the alignment and supplement restrictions of the corresponding API must be met. For details, see Secondary Memory Allocation.

Reference

For the API call sequence, see API Call Sequence.
For details about the API call example, see Sample Code.
For details about the stream creation and management, see Stream Management.

Parent topic: Model Execution