aclmdlExecute

Description

Executes model inference until the result is returned.

Restrictions

If the same modelId is shared by multiple threads due to service requirements, locks must be added between user threads to ensure that operations of refreshing the input and output memories and executing inference are performed continuously. For example:

// API call sequence of thread A:
lock(handle1) -> aclrtMemcpy refreshes the input and output memories - > alcmdlExecute executes inference - > unlock(handle1)

// API call sequence of thread B:
lock(handle1) -> aclrtMemcpy refreshes the input and output memories - > alcmdlExecute executes inference - > unlock(handle1)

The operations of loading, executing, and unloading a model must be performed in the same context. For details about how to create a context, see aclrtSetDevice or aclrtCreateContext.
You can call acldvppMalloc, hi_mpi_dvpp_malloc, aclrtMalloc, aclrtMallocHost, or aclrtMallocCached to allocate the device memory for storing the model input and output data.
Notes:
- For details about the application scenarios and restrictions of each memory allocation API, see the description of that API.
- The aclrtMallocHost API can allocate device memory only in the following forms:
  Ascend RC form
- The acldvppMalloc and hi_mpi_dvpp_malloc APIs are dedicated memory allocation APIs for media data processing. To reduce copy, the output of media data processing is used as the input of model inference to implement memory reuse.
- Hardware has memory alignment and supplement requirements. If you use one of these APIs to allocate a large memory block, and divide and manage the memory, the alignment and supplement restrictions of the corresponding API must be met. For details, see Secondary Memory Allocation.

Prototype

aclError aclmdlExecute(uint32_t modelId, const aclmdlDataset *input, aclmdlDataset *output)

Parameters

Parameter

Input/Output

Description

modelId

Input

Inference model ID.

After a model loading API (such as aclmdlLoadFromFile and aclmdlLoadFromMem) is successfully called, a model ID is returned. The ID is used as the input of this API.

input

Input

Pointer to the input data for model inference.

output

Output

Pointer to the output data for model inference.

When calling aclCreateDataBuffer to create the aclDataBuffer type for storing output data of the corresponding index, you can pass nullptr to the data parameter and set size to 0 to create an empty aclDataBuffer type. During model execution, the system automatically calculates and allocates the index output buffer. This method saves memory. However, you need to free the memory and reset the aclDataBuffer after using the data. In addition, memory copy is involved when the system allocates memory, which may cause performance loss.

The sample code for freeing the memory and resetting the aclDataBuffer is as follows:

            
                 aclDataBuffer *dataBuffer = aclmdlGetDatasetBuffer(output, 0); // Obtain the corresponding data buffer based on the index.
void *data = aclGetDataBufferAddr(dataBuffer);  // Obtain the device pointer to data.
aclrtFree(data ); // Free the device memory.
aclUpdateDataBuffer(dataBuffer, nullptr, 0); // Reset the content in the data buffer for next inference.

Returns

The value 0 indicates success, and other values indicate failure. For details, see aclError.

aclmdlExecute

Description

Restrictions

Prototype

Parameters

Returns

See Also