算子融合规则配置功能(optimization_switch)
功能简介
算子编译时,可根据实际业务需要灵活地配置算子融合,以降低网络推理时间、提高整网性能。
本功能是算子融合规则的控制开关,与算子融合规则配置功能(fusion_switch_file)类似,差异如下:
config.fusion_config.fusion_switch_file仅能关闭图融合和UB融合的规则,并且需要单独配置Json文件;而本功能适用于所有融合规则的指定,不需要再单独设置Json文件。如果两个功能都配置,且配置了同一个融合规则,则以本功能配置为准。
使用方法
该功能通过torchair.get_npu_backend中compiler_config配置,示例如下,仅供参考不支持直接拷贝运行,参数介绍参见表1。
1 2 3 4 5 6 | import torch_npu, torchair config = torchair.CompilerConfig() # 算子融合规则配置开关 config.ge_config.optimization_switch = "Passname1:on;Passname2:off" npu_backend = torchair.get_npu_backend(compiler_config=config) opt_model = torch.compile(model, backend=npu_backend) |
参数名 |
说明 |
---|---|
optimization_switch |
算子编译时,融合规则的控制开关。取值格式为key-value键值对,形如"Passname1:on;Passname2:off",key为Pass名称,value为on(表示开)或off(表示关),不支持大小写模式匹配,多组配置使用英文分号分隔。可配置的融合规则请参见融合规则列表。 |
融合规则列表
- AABiasaddConvFusion
- AddNFusionPass
- AddRmsNormFusionGraphPass
- ADeformableConv2dPass
- ADepthwiseFusionPass
- ALSTMFusionPass
- ApplyAddOutputPass
- ApplyAddOutputPass
- AReduceMeanFusionPass
- AReduceSumFusionPass
- ArgMaxWithFusionPass
- AvgPool3DFusionPass
- AvgPool3DGradFusionPass
- AvgPoolGradFusionPass
- AvgPoolQuantProcessFusionPass
- BatchMatMulFusionPass
- BatchMatmulV2QuantProcessFusionPass
- BatchNormBnInferFusionPass
- BatchNormGradBnInferGradFusion
- BatchNormGradInfGradFusion
- CastRemoveFusionPass
- clip_by_norm_nodivsquaresum
- CommonLSTMFusionPass
- CommonSubexpressionEliminationPass:GE公共表达式消除Pass
- ConstToAttrGatherV2Fusion
- ConstToAttrPass
- ConstToAttrPass
- ConstToAttrReduceSumFusion
- ConstToAttrResizeNearestNeighborGradFusion
- ConstToAttrStridedSliceV2Fusion
- Conv2DQuantProcessFusionPass
- Conv2DTDQuantProcessFusionPass
- COPYPass
- DeConvQuantProcessFusionPass
- DeformableOffsetsFusionPass
- DepthwiseDfFusionPass
- DepthwiseDwMulFusionPass
- DepthwiseFusionPass
- DepthwiseInsertTransDataFusionPass
- DepthwiseToConv2dFusionPass
- DreluFusionPass
- DWConv2DQuantProcessFusionPass
- DynamicGRUV2GradFusionPass
- DynamicRNNFusionPass
- DynamicRNNGradAFusionPass
- DynamicRNNGradAlignFusionPass
- DynamicRNNGradDAlignFusionPass
- DynamicRNNGradDFusionPass
- DynamicRNNGradFusionPass
- DynamicRNNInsertTransposePass
- DynamicRNNInsertTransposePass
- DynamicRNNSeqFusionPass
- EinsumPass
- FCQuantProcessFusionPass
- FixPipeAbilityProcessPass
- FlattenV2Pass
- FusedBatchNormBertFusionPass
- FusedBatchnormFusionPass
- FusedBatchNormGradFusionPass
- FusedBatchNormGradFusionPass
- Globalavgpoolpass
- GroupConv2DQuantProcessFusionPass
- HostBNFusionPass
- HostShapeOptimizationPass:AI CPU断流水pass
- InplaceAddRmsNormFusionPass
- MapIndexFusionPass
- MatMul2MatMulV2FusionPass
- MatmulV2QuantProcessFusionPass
- MaxPoolWithArgmaxFusionPass
- NormalizeFusionPass
- PackFusionPass
- PassThroughFusionPass
- PassThroughSecondFusionPass
- PermuteFusionPass
- PoolingQuantProcessFusionPass
- PReluGradFusionPass
- PriorBoxPass
- ProposalFusionPass
- RNNFusionPass
- sedBatchNormGradFusionPass
- SingleBatchNormFusion
- SoftmaxGradExtFusion
- SpatialTransformerDPass
- SPPPass
- TbeAntiquantMaxpoolingFusionPass
- TbeConvCommonRules0FusionPass
- TbeConvCommonRules2FusionPass
- TbeConvDequantS16FusionPass
- TbeConvRequantFusionPass
- TfMergeConv2DBackpropInputFusionPass
- TfMergeSubFusionPass
- TfTagNoConstFoldingFusionPass
- TransdataCastFusionPass
- TransposedUpdateFusionPass
- UnpackFusionPass
- V100NotRequantFusionPass
- V200NotRequantFusionPass
- WeightQuantBatchMatmulV2TransposeFusionPass
- YoloPass
- YoloV2DetectionOutputPass
- YoloV3DetectionOutputV2Pass
- ZConcatDFusionPass
- ZConcatExt2FusionPass
- ZConfusionSoftmaxGradFusionPass
- ZSplitVDFusionPass
- ZSplitVFusionPass
父主题: max-autotune模式功能