aclopCompileAndExecute

Applicability

Product

Supported

Atlas A3 training products/Atlas A3 inference products

Atlas A2 training products/Atlas A2 inference products

Atlas 200I/500 A2 inference products

Atlas inference products

Atlas training products

Description

Builds and executes an operator. Currently, only static-shape operators are supported. This API is asynchronous.

The inputs, outputs, and attributes of operators are different from each other. This API searches for the corresponding task based on the optype, input tensor description, output tensor description, and attributes, and delivers the task for execution. Therefore, you need to organize operators in strict accordance with their inputs, outputs, and attributes when calling this API.

The build options are set by using the aclSetCompileopt call.

Prototype

aclError aclopCompileAndExecute(const char *opType,
int numInputs,
const aclTensorDesc *const inputDesc[],
const aclDataBuffer *const inputs[],
int numOutputs,
const aclTensorDesc *const outputDesc[],
aclDataBuffer *const outputs[],
const aclopAttr *attr,
aclopEngineType engineType,
aclopCompileType compileFlag,
const char *opPath,
aclrtStream stream)

Parameters

Parameter

Input/Output

Description

opType

Input

Pointer to the operator type name.

numInputs

Input

Number of input tensors.

inputDesc

Input

Pointer array of the input tensor description.

Call aclCreateTensorDesc to create data of the aclTensorDesc type in advance.

The array length is consistent with numInputs. The elements in the inputs array match those in the inputDesc array with ordering preserved.

inputs

Input

Pointer array of the input tensors.

Call to create data of the aclDataBuffer type in advance.

The array length is consistent with numInputs. The elements in the inputs array match those in the inputDesc array with ordering preserved.

numOutputs

Input

Number of output tensors.

outputDesc

Input

Pointer array of the output tensor description.

Call aclCreateTensorDesc to create data of the aclTensorDesc type in advance.

The array length is consistent with numOutputs. The elements in the outputs array match those in the outputDesc array with ordering preserved.

outputs

Input/Output

Pointer array of the output tensors.

Call to create data of the aclDataBuffer type in advance.

The array length is consistent with numOutputs. The elements in the outputs array match those in the outputDesc array with ordering preserved.

attr

Input

Pointer to the operator attributes.

Call aclopCreateAttr to create data of the aclopAttr type in advance.

engineType

Input

Operator execution engine.

compileFlag

Input

Operator built flag.

opPath

Input

Pointer to the path of the operator implementation file (.py), excluding the file name. This parameter is reserved. Currently, this parameter can only be set to nullptr.

stream

Input

Target stream of the operator.

Returns

0 on success; else, failure. For details, see aclError.

Restrictions

  • In multi-thread scenarios, you cannot specify the same stream or use the default stream when calling this API. Otherwise, the task execution may be abnormal.
  • As the inputs, outputs, and attributes of each operator are different, the app needs to organize operators in strict accordance with their inputs, outputs, and attributes. When aclopCompileAndExecute is called, the API searches for the corresponding task based on the optype, input tensor description, output tensor description, and attributes before building and running the operator.
  • If an operator with an unused optional input is compiled and executed:
    • Create data of the aclTensorDesc type by using the aclCreateTensorDesc(ACL_DT_UNDEFINED, 0, nullptr, ACL_FORMAT_UNDEFINED) call, indicating that the data type is ACL_DT_UNDEFINED, the format is ACL_FORMAT_UNDEFINED, and the shape is nullptr.
    • Create data of the aclDataBuffer type by using the (nullptr, 0) call. aclDataBuffer does not need to be freed since it is a null pointer.
  • Before building and executing an operator with constant input, call the aclSetTensorConst API to set the constant input.

    If an operator has a constant input but aclSetTensorConst has not been called to set the constant input, call aclSetTensorPlaceMent to set the placement attribute of TensorDesc and set memType to the host memory.

  • Typically, it is a best practice to store the input/output tensor data to feed for running a single-operator (for example, the add operator) in the device memory. Some operators, however, take not only tensor data in the device memory (such as the feature map and weights) but also tensor data in the host memory (such as tensor shape and learning rate) as inputs. In this case, you do not need to manually transfer such tensor data from the host to the device. You only need to call aclSetTensorPlaceMent to set the placement attribute of the corresponding TensorDesc to the host memory to instruct the API to transfer the tensor data from the host to the device at operator runtime.