--precision_mode_v2

Description

Sets the precision mode of a model.

Argument

fp16:
Forces float16 for operators supporting both float16 and float32.
origin:
Retain the original precision.
- If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
- If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
- If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
cube_fp16in_fp32out:
The system selects a processing mode based on the operator type for operators supporting both float16 and float32.
- For cube operators, the system processes the computation based on the operator implementation.
  1. The preferred input data type is float16 and the output data type is float32.
  2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
  3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
  4. If the float16 input and output data types are not supported, an error is reported.
- For vector operators, float32 is forcibly selected for operators supporting both float16 and float32, even if the original precision is float16.
  This argument is invalid if your model contains operators not supporting float32, for example, an operator that supports only float16. In this case, float16 is retained. If the operator does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator is not supported, an error is reported.
mixed_float16:
Mixed precision of float16 and float32 is used for neural network processing. Computations are done in float16 for float32 operators according to the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal accuracy degradation.

If this mode is used, you can view the value of the precision_reduce option in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json in the OPP installation directory.
- If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to float16.
- If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to float16.
- If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
mixed_hif8: enables automatic mixed precision, indicating that hifloat8 (for details about this data type, see Link), float16, and float32 are used together to process the neural network. In this mode, hifloat8 is automatically used for certain float16 and float32 operators based on the built-in tuning policies. This will improve system performance and reduce memory footprint with minimal precision degradation. The current version does not support this option.
If this mode is used, you can view the value of precision_reduce in the built-in tuning policy file ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/<soc_version>/aic-<soc_version>-ops-info.json.
- true: The operator is on the mixed precision trustlist and its precision will be reduced from float16/float32 to hifloat8.
- false: The operator is on the mixed precision blocklist and its precision will not be reduced from float16/float32 to hifloat8.
- If an operator does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
cube_hif8: The hifloat8 data type is forcibly used if the Cube operator in the network model supports both hifloat8 and float16/float32. The current version does not support this option.

Default: fp16

Suggestions and Benefits

The accuracy and performance of the network model vary according to the configured precision mode.

Sort by precision: origin > mixed_float16 > fp16. Sort by performance: fp16 ≥ mixed_float16 > origin.

Example

--precision_mode_v2=fp16

Applicability

Atlas 200/300/500 Inference Product

Atlas Training Series Product

Restrictions

In the mixed precision scenario, if the inference performance deteriorates after the version upgrade, you are advised to use the AOE tool to perform optimization again. After the optimization is complete, use the --op_bank_path option to load the path of the custom repository, and then convert the model again.

For details about operator tuning, see AOE Instructions.

Parent topic: Operator Tuning Options