aclSetDynamicOutputTensorAddr
Function Usage
After aclOpExecutor reuse is enabled by the aclSetAclOpExecutorRepeatable call, if the output device memory address changes, the device memory address recorded in the output aclTensorList needs to be updated.
Prototype
aclnnStatus aclSetDynamicOutputTensorAddr(aclOpExecutor *executor, size_t irIndex, const size_t relativeIndex, aclTensorList *tensors, void *addr)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
executor |
Input |
aclOpExecutor that is set to the reusable state. |
irIndex |
Input |
Index of aclTensorList to be updated in the operator IR prototype definition, starting from 0. |
relativeIndex |
Input |
Index of aclTensor to be updated in aclTensorList. If aclTensorList has N tensors, the value range is [0, N – 1]. |
tensors |
Input |
aclTensorList pointer to be updated. |
addr |
Input |
Device storage address to be updated to the specified aclTensor. |
Returns
0 on success; else, failure.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | // Create the input and output aclTensor and aclTensorList. std::vector<int64_t> shape = {1, 2, 3}; aclTensor tensor1 = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT, nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr); aclTensor tensor2 = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT, nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr); aclTensor tensor3 = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT, nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr); aclTensor output = aclCreateTensor(shape.data(), shape.size(), aclDataType::ACL_FLOAT, nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), shape.size(), nullptr); aclTensor *list[] = {tensor3, tensor4}; auto tensorList = aclCreateTensorList(list, 2); uint64_t workspace_size = 0; aclOpExecutor *executor; // The AddCustom operator has two inputs (aclTensor) and one output (aclTensorList). // Call the first-phase API. aclnnAddCustomGetWorkspaceSize(tensor1, tensor2, tensorList , &workspace_size, &executor); // Set the executor to be reusable. aclSetAclOpExecutorRepeatable(executor); void *addr; aclSetDynamicOutputTensorAddr(executor, 0, 0, tensorList, addr); // Update the device address of the first aclTensor in the output tensor list. aclSetDynamicOutputTensorAddr(executor, 0, 1, tensorList, addr); // Update the device address of the second aclTensor in the output tensor list. ....... // Call the second-phase API. aclnnAddCustom(workspace, workspace_size, executor, stream); // Clear the executor. aclDestroyAclOpExecutor(executor); |