Operator Execution
The computation algorithm of an operator is implemented through Ascend C APIs, and the loading and call of the operator are implemented through runtime APIs. This section describes the common runtime APIs used for running Ascend C operators in the CANN software stack based on kernel function call. For more information about runtime APIs, see "acl API (C&C++)".
Code Loading and Running
- Use aclInit for initialization.
- Use aclrtSetDevice and aclrtCreateStream to allocate runtime resources for the device and stream, respectively.
- Use aclrtMallocHost to allocate host memory and initialize data.
- Use aclrtMalloc to allocate device memory and use aclrtMemcpy to copy data from the host to the device for kernel function computation.
- Use <<<>>> to call the operator kernel function.
- After the kernel function is executed, copy the computation result on the device back to the host.
- Use aclrtSynchronizeStream to asynchronously wait for the kernel function execution to complete.
- Call aclrtDestroyStream and aclrtResetDevice to release the stream and device runtime resources, respectively.
- Use aclFinalize for deinitialization.

More Methods for Kernel Loading and Execution
Kernel loading and execution can also be implemented through binary loading, which is the bottom-layer API implementation. The kernel launch symbol <<<...>>> encapsulates the bottom-layer APIs. To use an operator, you need to compile the operator source file into a binary .o file using BiSheng command lines, and then call the kernel loading and execution APIs such as aclrtLaunchKernelWithConfig to call the operator.
- For details about the kernel loading and execution APIs, see "Kernel Loading and Execution".
- For details about how to use Bisheng command line compilation options, see Common Compilation Options.
- For details about the complete example, see the sample of loading and executing a kernel (loading a binary file).
The kernel function is called asynchronously. After the kernel function is called, the control right is returned to the host immediately. You can call aclrtSynchronizeStream to force the host program to wait until all kernel functions are executed.
1
|
aclError aclrtSynchronizeStream(aclrtStream stream); |