Compilation, Link, and Library Usage
Sample Code
This section uses the code of the following executable test cases as an example. Before using the code, you need to configure the environment. For details, see Heterogeneous Compilation Procedure.
// A device file, which contains the kernel function code.
// test.cce
__global__ [aicore] void foo(__gm__ uint8_t *Out, int Stride) {
Out[block_idx * Stride] = block_idx;
}
// A host file, which is used to heterogeneously call the kernel code through <<<>>>.
// test_main.cce
#include "acl/acl.h"
#include <stdio.h>
#include <stdlib.h>
#define BLOCKS 4
#define CACHELINE_SZ 64
extern void __global__ [aicore] foo(__gm__ uint8_t *Out, int Stride);
int main(int argc, char *argv[]) {
aclInit(nullptr);
aclrtSetDevice(0);
aclrtStream stream;
aclrtCreateStream(&stream);
uint8_t ExpectedValue[] = {0, 1, 2, 3};
uint8_t *OutputValue = nullptr;
aclrtMalloc((void **)&OutputValue, BLOCKS, ACL_MEM_MALLOC_NORMAL_ONLY);
uint8_t InitValue[BLOCKS] = {0};
aclrtMemcpyAsync((void *)OutputValue, sizeof(InitValue), InitValue,
sizeof(InitValue), ACL_MEMCPY_HOST_TO_DEVICE, stream);
aclrtSynchronizeStream(stream);
// Invoke a kernel
foo<<<BLOCKS, nullptr, stream>>>(OutputValue, CACHELINE_SZ);
uint8_t *OutHost = nullptr;
aclrtMallocHost((void **)&OutHost, BLOCKS * CACHELINE_SZ);
aclrtMemcpyAsync(OutHost, BLOCKS * CACHELINE_SZ, OutputValue,
BLOCKS * CACHELINE_SZ, ACL_MEMCPY_DEVICE_TO_HOST, stream);
aclrtSynchronizeStream(stream);
for (int I = 0; I < sizeof(ExpectedValue) / sizeof(uint8_t); I++) {
printf("i%d\t Expect: 0x%04x\t\t\t\tResult: 0x%04x\n", I, ExpectedValue[I],
OutHost[I * CACHELINE_SZ]);
}
aclrtFreeHost(OutHost);
aclrtFree(OutputValue);
aclrtDestroyStream(stream);
aclrtResetDevice(0);
aclFinalize();
return 0;
}
This example provides two files. test.cce contains only the device code, and test_main.cce contains the host heterogeneous code. This is a typical example that covers most scenarios. To be concise, this example does not use a separate header file. The scenario where a header file is used is the same as that of other C-like languages.
The development environment and operating environment of this example are the same, and are built and run locally.
Generating and Using a Dynamic Link Library
# Compilation command: # Compile and link the device file into a dynamic link library. libabcd.so is also a dynamic library on the host, with the device code embedded. $bisheng -O2 -fPIC --npu-arch=dav-2201 test.cce --shared -o libabcd.so # Compile the host file into an executable file and link the dynamic link library during compilation. $bisheng test_main.cce -I$RT_INC -L$RT_LIB -lascendcl -L ./ -labcd -lruntime -o test
Generating and Using a Static Link Library
# Compilation command: # Compile and link the device file into the static link library libstaticabcd.a. $bisheng -O2 -fPIC --npu-arch=dav-2201 -I$RT_INC test.cce --cce-build-static-lib -o libstaticabcd.a # Compile the host file into an executable file and link the static link library during compilation. $bisheng test_main.cce -I$RT_INC -L$RT_LIB -L ./ -lstaticabcd -lruntime -lascendcl -lstdc++ -o test
You can use --cce-build-static-lib to compile multiple files and pack them into a static library on the host, with the linked device code embedded. Note that the device code in this compilation mode has been linked and is not suitable for linking with other target files that embed relocatable device obj.
Case Execution
The execution result in the preceding scenarios is as follows:
$./test i0 Expect: 0x0000 Result: 0x0000 i1 Expect: 0x0001 Result: 0x0001 i2 Expect: 0x0002 Result: 0x0002 i3 Expect: 0x0003 Result: 0x0003