功能简介

大模型切分部署场景中，通过对网络中的AllReduce通信算子以及上下文中可以连续切分的计算算子切分，使能通信和计算并行运行，从而达到加速分布式推理的作用。

使用方法

在设置config时使用下列开关进行配置，默认False，如需开启设为True。

import torch_npu
import torchair as tng
config = tng.CompilerConfig()
# 计算通信并行功能开关
config.experimental_config.cc_parallel_enable = True
npu_backend = tng.get_npu_backend(compiler_config=config)
...
model = Model()
model = torch.compile(model, backend=npu_backend, dynamic=False)

计算与通信并行功能

功能简介

使用方法