aclSetAclOpExecutorRepeatable

Function Usage

Enables aclOpExecutor to be reusable. If you want to reuse existing aclOpExecutor, you must call this API immediately to enable the reuse after the first-phase API aclxxXxxGetworkspaceSize is executed. Later, you can call the second-phase API aclXxx for multiple times for operator execution.

aclOpExecutor is an operator executor defined by the framework. It is a container for executing operator computing. You can directly use it without paying attention to its internal implementation.

Prototype

aclnnStatus aclSetAclOpExecutorRepeatable(aclOpExecutor *executor)

Parameters

Parameter

Input/Output

Description

executor

Input

aclOpExecutor to be reused.

Returns

0 on success; else, failure. For details about the return codes, see Common APIs and Return Codes.

Possible causes:

  • If error code 561103 is returned, executor is a null pointer.

Constraints

  • Currently, operators that use AI CPU and AI Core compute units support aclOpExecutor reuse.
  • When a single-operator API is called, aclOpExecutor reuse cannot be enabled in the following scenarios:
    • If L0 APIs related to host-to-device and device-to-device copy are used, such as CopyToNpu, CopyNpuToNpu, and CopyToNpuSync, aclOpExecutor cannot be reused.
    • If the L0 ViewCopy API is used and the source address and destination address of ViewCopy are the same, aclOpExecutor cannot be reused.

    For details about L0 APIs, see Basic Tensor Operation APIs.

  • When a single-operator API is called, a device tensor cannot be created in the operator API. Only external tensors can be used.
  • aclOpExecutor that is set to the reusable state does not clear the executor resources after the second API is executed. It needs to be used with aclDestroyAclOpExecutor to clear the resources.

Examples

The following code examples are for reference only and are not intended for direct copying and execution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Create the input and output aclTensor and aclTensorList.
std::vector<int64_t> shape = {1, 2, 3};
aclTensor tensor1 = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT,
nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr);
aclTensor tensor2 = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT,
nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr);
aclTensor tensor3 = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT,
nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr);
aclTensor output = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT,
nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr);
aclTensor *list[] = {tensor1, tensor2};
auto tensorList = aclCreateTensorList(list, 2);
uint64_t workspaceSize = 0;
aclOpExecutor *executor;
// The AddCustom operator has two inputs (aclTensorList and aclTensor) and one output (aclTensor).
// Call the first-phase API.
aclnnAddCustomGetWorkspaceSize(tensorList, tensor3, output, &workspaceSize, &executor);
// Set the executor to be reusable.
aclSetAclOpExecutorRepeatable(executor);  
void *addr;
aclSetDynamicInputTensorAddr(executor, 0, 0, tensorList, addr);   // Update the device address of the first aclTensor in the input tensor list.
aclSetDynamicInputTensorAddr(executor, 0, 1, tensorList, addr);  // Update the device address of the second aclTensor in the input tensor list.
...
// Call the second-phase API.
aclnnAddCustom(workspace, workspaceSize, executor, stream);
// Clear the executor.
aclDestroyAclOpExecutor(executor);