Dynamic Input Operator Development

A dynamic input operator indicates that the number of inputs is dynamic. For example, AddN indicates that N input tensors are accumulated to output one tensor, and the number of input tensors is not fixed. Dynamic input operators are developed in different ways in constructing and parsing input data: The input parameter of the kernel function uses the ListTensorDesc structure to store input data. Accordingly, the TensorList structure needs to be constructed to store parameter information during the call. The following describes the development processes based on the kernel launch and project-based operator development modes.

The following lists only code snippets. For details about a complete example, see dynamic input operator sample (project-based operator development) and dynamic input operator sample (kernel launch).

Kernel launch

Define ListTensorDesc and TensorDesc structs by referring to the ListTensorDesc struct and save the actual input data to the ListTensorDesc struct. The following is an example:

ptrOffset is the offset between the start addresses of ListTensorDesc and the data pointer dataPtr. tensorDesc stores the description of the two input tensors. dataPtr is the pointer to the address for storing the input data.

          
                   constexpr uint32_t SHAPE_DIM = 2;
    struct TensorDesc {
        uint32_t dim{SHAPE_DIM};
        uint32_t index;
        uint64_t shape[SHAPE_DIM] = {8, 2048};
    };

    TensorDesc xDesc;
    xDesc.index = 0;
    TensorDesc yDesc;
    yDesc.index = 1;

    constexpr uint32_t TENSOR_DESC_NUM = 2;
    struct ListTensorDesc {
        uint64_t ptrOffset;
        TensorDesc tensorDesc[TENSOR_DESC_NUM];
        uintptr_t dataPtr[TENSOR_DESC_NUM];
    } inputDesc;
...
inputDesc = {(1 + (1 + SHAPE_DIM) * TENSOR_DESC_NUM) * sizeof(uint64_t), {xDesc, yDesc}, {(uintptr_t)xDevice, (uintptr_t)yDevice}};

During the call on the kernel, directly pass the input information expressed by ListTensorDesc to the function. The following is an example:

         
                  void *inputDescInDevice = nullptr;
    CHECK_ACL(aclrtMalloc((void **)&inputDescInDevice, sizeof(ListTensorDesc), ACL_MEM_MALLOC_HUGE_FIRST));
    CHECK_ACL(aclrtMemcpy(inputDescInDevice, sizeof(ListTensorDesc), &inputDesc, sizeof(ListTensorDesc),
                          ACL_MEMCPY_HOST_TO_DEVICE));

    ACLRT_LAUNCH_KERNEL(addn_custom)(blockDim, stream, inputDescInDevice, zDevice);

During operator implementation on the kernel, use the APIs provided by ListTensorDesc and TensorDesc to parse and process the input information of ListTensorDesc. The following is an example:

         
                uint64_t buf[SHAPE_DIM] = {0};
  AscendC::TensorDesc<int32_t> tensorDesc;
  tensorDesc.SetShapeAddr(buf);
  listTensorDesc.GetDesc(tensorDesc, 0);
  uint64_t totalLength = tensorDesc.GetShape(0) * tensorDesc.GetShape(1);
  __gm__ uint8_t *x = listTensorDesc.GetDataPtr<__gm__ uint8_t>(0);
  __gm__ uint8_t *y = listTensorDesc.GetDataPtr<__gm__ uint8_t>(1);

Project-based operator development

Construct a List tensor and pass it to the function during the single-operator API call.

After creating a tensor using aclCreateTensor, call aclCreateTensorList to form a list of created tensors.

         
              inputTensorList = aclCreateTensorList(inputTensor_.data(), inputTensor_.size());

The input parameter of the API for obtaining the workspace size used by the operator also needs to use the aclTensorList struct parameter to calculate the workspace size. The following is an example:

         
              // Obtain the size of the workspace used by the operator.
aclnnStatus aclnnAddNCustomGetWorkspaceSize(const aclTensorList *srcList, const aclTensor *out, uint64_t *workspaceSize, aclOpExecutor **executor);

In the operator prototype definition, set the input data type to dynamic. The following is an example:

         
              this->Input("srcList")
    .ParamType(DYNAMIC)

During the operator implementation on the host, obtain dynamic input information by using the corresponding dynamic API.

For example, in the Tiling and InferShape functions, the GetDynamicInputShape API is used to obtain the shape information of the dynamic input. In the InferDataType function, the GetDynamicInputDataType API is used to obtain the data type of the dynamic input. The following is an example:

         
              namespace ge {
static graphStatus InferShape(gert::InferShapeContext *context)
{
    const gert::Shape *x1_shape = context->GetDynamicInputShape(0, 0);
    gert::Shape *y_shape = context->GetOutputShape(0);
    *y_shape = *x1_shape;
    return GRAPH_SUCCESS;
}

static graphStatus InferDataType(gert::InferDataTypeContext *context)
{
    const auto inputDataType = context->GetDynamicInputDataType(0, 0);
    context->SetOutputDataType(0, inputDataType);
    return ge::GRAPH_SUCCESS;
}
} // namespace ge

During operator implementation on the kernel, pass the dynamic structure data to the input parameter and use the AscendC::ListTensorDesc struct to parse the data.

Dynamic structure data must be passed to the input parameter of the kernel function, for example, GM_ADDR srcList. The following is an example:

         
              extern "C" __global__ __aicore__ void addn_custom(GM_ADDR srcList, GM_ADDR z, GM_ADDR workspace, GM_ADDR tiling)

Parse the input parameter srcList using the AscendC::ListTensorDesc struct to obtain the detailed information about each tensor. The following is an example:

         
              AscendC::ListTensorDesc keyListTensorDescInit((__gm__ void*)srcList);
GM_ADDR x = (__gm__ uint8_t*)keyListTensorDescInit.GetDataPtr<__gm__ uint8_t>(0);
GM_ADDR y = (__gm__ uint8_t*)keyListTensorDescInit.GetDataPtr<__gm__ uint8_t>(1);

Parent topic: Special Topics