功能验证
- 编译自定义算子包。
参考环境准备准备好环境,执行如下命令重新编译、安装自定义算子torch.ops.npu.my_op的torch_npu包。请注意与当前运行环境的Python版本匹配,以Python3.8版本为例:
bash ci/build.sh --python=3.8 pip3 install dist/torch*.whl --force-reinstall --no-deps
- 验证自定义算子在Eager模式、TorchAir reduce-overhead模式、TorchAir max-autotune模式下功能是否正常
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
import torch import torch_npu import torchair def test_eager(x, y, z, attr1, attr2): return torch.ops.npu.my_op(x, y, z, attr1, attr2) config = torchair.CompilerConfig() config.mode = "reduce-overhead" # 表示aclgraph模式 @torch.compile(backend=torchair.get_npu_backend(compiler_config=config)) def test_torchair_reduce_overhead(x, y, z, attr1, attr2): return torch.ops.npu.my_op(x, y, z, attr1, attr2) config = torchair.CompilerConfig() config.mode = "max-autotune" # 表示Ascend IR模式 @torch.compile(backend=torchair.get_npu_backend(compiler_config=config)) def test_torchair_max_autotune(x, y, z, attr1, attr2): return torch.ops.npu.my_op(x, y, z, attr1, attr2) x = torch.ones(4, 8).npu() y = None z = [torch.ones(4, 8).npu(), torch.ones(4, 8).npu()] attr1 = 2.0 attr2 = 5 test_eager(x, y, z, attr1, attr2) torch.npu.synchronize() print("Eager ok") test_torchair_reduce_overhead(x, y, z, attr1, attr2) torch.npu.synchronize() print("TorchAir-reduce-overhead ok") test_torchair_max_autotune(x, y, z, attr1, attr2) torch.npu.synchronize() print("TorchAir-max-autotune ok")
父主题: 非In-place算子开发和入图样例