功能验证

编译自定义算子包。
参考环境准备准备好环境，执行如下命令重新编译、安装自定义算子torch.ops.npu.my_op的torch_npu包。请注意与当前运行环境的Python版本匹配，以Python3.8版本为例：
```
bash ci/build.sh --python=3.8
pip3 install dist/torch*.whl --force-reinstall --no-deps
```

验证自定义算子在Eager模式、TorchAir reduce-overhead模式、TorchAir max-autotune模式下功能是否正常

import torch
import torch_npu
import torchair

def test_eager(x, y, z, attr1, attr2):
    return torch.ops.npu.my_op(x, y, z, attr1, attr2)

config = torchair.CompilerConfig()
config.mode = "reduce-overhead"        # 表示aclgraph模式
@torch.compile(backend=torchair.get_npu_backend(compiler_config=config))
def test_torchair_reduce_overhead(x, y, z, attr1, attr2):
    return torch.ops.npu.my_op(x, y, z, attr1, attr2)

config = torchair.CompilerConfig()
config.mode = "max-autotune"          # 表示Ascend IR模式
@torch.compile(backend=torchair.get_npu_backend(compiler_config=config))
def test_torchair_max_autotune(x, y, z, attr1, attr2):
    return torch.ops.npu.my_op(x, y, z, attr1, attr2)

x = torch.ones(4, 8).npu()
y = None
z = [torch.ones(4, 8).npu(), torch.ones(4, 8).npu()]
attr1 = 2.0
attr2 = 5

test_eager(x, y, z, attr1, attr2)
torch.npu.synchronize()
print("Eager ok")
test_torchair_reduce_overhead(x, y, z, attr1, attr2)
torch.npu.synchronize()
print("TorchAir-reduce-overhead ok")
test_torchair_max_autotune(x, y, z, attr1, attr2)
torch.npu.synchronize()
print("TorchAir-max-autotune ok")

父主题： 非In-place算子开发和入图样例