MindIE Torch支持将名为"npu"的设备注册到原生Torch中,可采用Torch的to接口实现Tensor的同步或异步拷贝。
"npu"设备仅支持Host和Device之间的数据拷贝功能,不支持其他操作。对于Device为"npu"的Tensor需要将其拷贝到CPU后方可进行运算或打印。为了顺利释放数据拷贝时所申请的"npu"设备资源,推荐用户在代码中使用try catch方式捕获异常并保证程序正常退出。
同步拷贝
- 同步拷贝C++伪代码:
| auto tensorCpu = at::randn({ 10, 10, 10 }, torch::kFloat);
auto tensorNpu = tensorCpu.to("npu:0"); // copy data from cpu to npu:0
auto tensorCpuNew = tensorNpu.to("cpu"); // copy data from npu:0 to cpu
|
- 同步拷贝Python伪代码:
| tensor_cpu = torch.randn((10, 10, 10), dtype=torch.float)
tensor_npu = tensor_cpu.to("npu:0")
tensor_cpu_new = tensor_npu.to("cpu")
|
异步拷贝
异步数据拷贝时需要CPU的Tensor使用pinned_memory=True,否则会没有异步数据拷贝的效果。
- 异步拷贝C++伪代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | auto optionCpu = torch::TensorOptions().device(at::Device("cpu")).layout(torch::kStrided).pinned_memory(true);
auto tensorCpu = at::randn({ 100, 1024, 1024 }, optionCpu);
auto tensorCpuNew = at::empty({ 100, 1024, 1024 }, optionCpu);
auto npu = at::Device("npu:0");
// create stream
c10::Stream stream = c10::Stream(c10::Stream::DEFAULT, npu);
c10::StreamGuard streamGuard(stream); // set stream
// copy data from cpu to npu:0
auto tensorDevice = tensorCpu.to(npu, /*non_blocking=*/true);
stream.synchronize();
// copy data from npu:0 to cpu
tensorCpuNew.copy_(tensorDevice, /*non_blocking=*/true);
stream.synchronize();
|
- 异步拷贝Python伪代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | input_cpu = torch.rand((1, 100, 1024, 1024), pin_memory = True)
output_cpu = torch.empty((1, 100, 1024, 1024), pin_memory = True)
# create stream
stream = mindietorch.npu.Stream("npu:0")
# copy data from cpu to npu:0
with mindietorch.npu.stream(stream):
output_npu = input_cpu.to("npu:0", non_blocking = True)
stream.synchronize()
# copy data from npu:0 to cpu
with mindietorch.npu.stream(stream):
output_cpu.copy_(output_npu, non_blocking = True)
stream.synchronize()
|