aclopCompileAndExecuteV2

Description

Compiles and executes an operator. Currently, only static-shape operators are supported. The compilation options are set by using the aclSetCompileopt call.

Restrictions

In multi-thread scenarios, you cannot specify the same stream or use the default stream when calling this API. Otherwise, the task execution may be abnormal.
As the inputs, outputs, and attributes of each operator are different, the app needs to organize operators in strict accordance with their inputs, outputs, and attributes. When aclopCompileAndExecuteV2 is called, AscendCL searches for the corresponding task based on the optype, input tensor description, output tensor description, and attributes before compiling and running the operator.
If an operator with an unused optional input is compiled and executed:
- Create data of the aclTensorDesc type by using the aclCreateTensorDesc(ACL_DT_UNDEFINED, 0, nullptr, ACL_FORMAT_UNDEFINED) call, indicating that the data type is ACL_DT_UNDEFINED, the format is ACL_FORMAT_UNDEFINED, and the shape is nullptr.
- Create data of the aclDataBuffer type by using the aclCreateDataBuffer(nullptr, 0) call, where aclDataBuffer does not need to be freed since it is a null pointer.
Before compiling and executing an operator with constant input, call aclSetTensorConst to set the constant input.
If an operator has a constant input but aclSetTensorConst has not been called to set the constant input, call aclSetTensorPlaceMent to set the placement attribute of TensorDesc and set memType to the host memory.
Typically, it is a best practice to store the input/output tensor data to feed for running a single-operator (for example, the add operator) in the device memory. Some operators, however, take not only tensor data in the device memory (such as the feature map and weights) but also tensor data in the host memory (such as tensor shape and learning rate). In this case, you do not need to manually transfer such tensor data from the host to the device. You only need to call aclSetTensorPlaceMent to set the placement attribute of the corresponding TensorDesc to the host memory to instruct AscendCL to transfer the tensor data from the host to the device at operator runtime.
For an operator with dynamic shape enabled, call aclopInferShape to obtain the output shape.
- If the accurate output shape can be obtained, use the obtained accurate output shape to construct an outputDesc, as one of the arguments passed to the aclopCompileAndExecuteV2 call. In this scenario, the aclopCompileAndExecuteV2 API is asynchronous. For an asynchronous API, the API call delivers a task rather than executes a task. After this API is called, call the synchronization API (for example, aclrtSynchronizeStream) to ensure that the task is complete.
- If the accurate output shape cannot be obtained and only the shape range can be obtained, the maximum value within the range is used to construct an outputDesc, as one of the arguments passed to the aclopCompileAndExecuteV2 call. In this scenario, after aclopCompileAndExecuteV2 is called to execute the operator, the system calculates the accurate output shape, as the outputDesc output of aclopCompileAndExecuteV2. In this case, aclopCompileAndExecuteV2 is a synchronous API.
- (Reserved) If the accurate output shape and shape range cannot be obtained, estimate a maximum shape to construct an outputDesc as one of the arguments passed to the aclopCompileAndExecuteV2 call. In this scenario, after aclopCompileAndExecuteV2 is called to execute the operator, the system calculates the accurate output shape, as the outputDesc output of aclopCompileAndExecuteV2. In this case, aclopCompileAndExecuteV2 is a synchronous API.

Prototype

aclError aclopCompileAndExecuteV2(const char *opType,

int numInputs,

aclTensorDesc *inputDesc[],

aclDataBuffer *inputs[],

int numOutputs,

aclTensorDesc *outputDesc[],

aclDataBuffer *outputs[],

aclopAttr *attr,

aclopEngineType engineType,

aclopCompileType compileFlag,

const char *opPath,

aclrtStream stream)

Parameters

Parameter	Input/Output	Description
opType	Input	Pointer to the operator type name.
numInputs	Input	Number of input tensors.
inputDesc	Input	Pointer array of the input tensor description. Call aclCreateTensorDesc to create data of the aclTensorDesc type in advance. The array length is consistent with numInputs. The elements in the inputs array match those in the inputDesc array with ordering preserved.
inputs	Input	Pointer array of the input tensors. Call aclCreateDataBuffer to create data of the aclDataBuffer type in advance. The array length is consistent with numInputs. The elements in the inputs array match those in the inputDesc array with ordering preserved.
numOutputs	Input	Number of output tensors.
outputDesc	Input/Output	Pointer array of the output tensor description. Call aclCreateTensorDesc to create data of the aclTensorDesc type in advance. The array length is consistent with numOutputs. The elements in the outputs array match those in the outputDesc array with ordering preserved.
outputs	Input/Output	Pointer array of the output tensors. Call aclCreateDataBuffer to create data of the aclDataBuffer type in advance. The array length is consistent with numOutputs. The elements in the outputs array match those in the outputDesc array with ordering preserved.
attr	Input	Pointer to the operator attributes. Call aclopCreateAttr to create data of the aclopAttr type in advance.
engineType	Input	Operator execution engine.
compileFlag	Input	Operator compiled flag.
opPath	Input	Pointer to the path of the operator implementation file (.py), excluding the file name. This parameter is reserved. Currently, this parameter can only be set to nullptr.
stream	Input	Target stream of the operator.

Returns

The value 0 indicates success, and other values indicate failure. For details, see aclError.

Parent topic: Single-Operator Model Execution