Compilation, Link, and Library Usage

Sample Code

This section uses the code of the following executable test cases as an example. Before using the code, you need to configure the environment. For details, see Heterogeneous Compilation Procedure.

// A device file, which contains the kernel function code.
// test.cce
__global__ [aicore] void foo(__gm__ uint8_t *Out, int Stride) {
  Out[block_idx * Stride] = block_idx;
}
// A host file, which is used to heterogeneously call the kernel code through <<<>>>.
// test_main.cce
#include "acl/acl.h"
#include <stdio.h>
#include <stdlib.h>

#define BLOCKS 4
#define CACHELINE_SZ 64

extern void __global__ [aicore] foo(__gm__ uint8_t *Out, int Stride);

int main(int argc, char *argv[]) {
  aclInit(nullptr);
  aclrtSetDevice(0);
  aclrtStream stream;
  aclrtCreateStream(&stream);

  uint8_t ExpectedValue[] = {0, 1, 2, 3};
  uint8_t *OutputValue = nullptr;
  aclrtMalloc((void **)&OutputValue, BLOCKS, ACL_MEM_MALLOC_NORMAL_ONLY);

  uint8_t InitValue[BLOCKS] = {0};
  aclrtMemcpyAsync((void *)OutputValue, sizeof(InitValue), InitValue,
                   sizeof(InitValue), ACL_MEMCPY_HOST_TO_DEVICE, stream);
  aclrtSynchronizeStream(stream);

  // Invoke a kernel
  foo<<<BLOCKS, nullptr, stream>>>(OutputValue, CACHELINE_SZ);

  uint8_t *OutHost = nullptr;
  aclrtMallocHost((void **)&OutHost, BLOCKS * CACHELINE_SZ);
  aclrtMemcpyAsync(OutHost, BLOCKS * CACHELINE_SZ, OutputValue,
                   BLOCKS * CACHELINE_SZ, ACL_MEMCPY_DEVICE_TO_HOST, stream);
  aclrtSynchronizeStream(stream);

  for (int I = 0; I < sizeof(ExpectedValue) / sizeof(uint8_t); I++) {
    printf("i%d\t Expect: 0x%04x\t\t\t\tResult: 0x%04x\n", I, ExpectedValue[I],
           OutHost[I * CACHELINE_SZ]);
  }

  aclrtFreeHost(OutHost);
  aclrtFree(OutputValue);

  aclrtDestroyStream(stream);
  aclrtResetDevice(0);
  aclFinalize();
  return 0;
}

This example provides two files. test.cce contains only the device code, and test_main.cce contains the host heterogeneous code. This is a typical example that covers most scenarios. To be concise, this example does not use a separate header file. The scenario where a header file is used is the same as that of other C-like languages.

The development environment and operating environment of this example are the same, and are built and run locally.

Generating and Using a Dynamic Link Library

Compile and link the device file into a dynamic link library in one step, use BiSheng to link the dynamic link library, and compile the host file into an executable file.
# Compilation command:
# Compile and link the device file into a dynamic link library. libabcd.so is also a dynamic library on the host, with the device code embedded.
$bisheng -O2 -fPIC --npu-arch=dav-2201 test.cce --shared -o libabcd.so
# Compile the host file into an executable file and link the dynamic link library during compilation.
$bisheng test_main.cce -I$RT_INC -L$RT_LIB -lascendcl -L ./ -labcd -lruntime -o test

Generating and Using a Static Link Library

Compile the device file into a static link library, use BiSheng to link the static link library, and compile the host file into an executable file.
# Compilation command:
# Compile and link the device file into the static link library libstaticabcd.a.
$bisheng -O2 -fPIC --npu-arch=dav-2201 -I$RT_INC test.cce --cce-build-static-lib  -o libstaticabcd.a
# Compile the host file into an executable file and link the static link library during compilation.
$bisheng test_main.cce -I$RT_INC -L$RT_LIB -L ./ -lstaticabcd  -lruntime -lascendcl -lstdc++ -o test

You can use --cce-build-static-lib to compile multiple files and pack them into a static library on the host, with the linked device code embedded. Note that the device code in this compilation mode has been linked and is not suitable for linking with other target files that embed relocatable device obj.

Case Execution

The execution result in the preceding scenarios is as follows:

$./test
i0	 Expect: 0x0000				Result: 0x0000
i1	 Expect: 0x0001				Result: 0x0001
i2	 Expect: 0x0002				Result: 0x0002
i3	 Expect: 0x0003				Result: 0x0003