--precision_mode

Applicability

Product	Supported
Atlas A3 training products / Atlas A3 inference products	√
Atlas A2 training products / Atlas A2 inference products	√
Atlas 200I/500 A2 inference products	√
Atlas inference products	√
Atlas training products	√

Description

Sets the precision mode of a model.

Argument

Argument:

force_fp32/cube_fp16in_fp32out:
force_fp32 and cube_fp16in_fp32out have the same effect. This option indicates that the system selects different processing modes based on the operator type when the operator in the AI Core supports both the float32 and float16 data types. cube_fp16in_fp32out is newly added to the new version. For cube operators, this option has clearer semantics.
- For cube operators, the system processes the computation based on the operator implementation.
  1. The preferred input data type is float16 and the output data type is float32.
  2. If the float16 input data and float32 output data types are not supported, set both the input and output data types to float32.
  3. If the float32 input and output data types are not supported, set both the input and output data types to float16.
  4. If the float16 input and output data types are not supported, an error is reported.
- For vector compute operators, the operator precision in the original graph is float16 or bfloat16, and float32 is forcibly selected.
  This option is invalid if the original graph contains operators not supporting float32 in the AI Core, for example, an operator that supports only float16. In this case, float16 is retained. If the operator in the AI Core does not support float32 and it is configured to the blocklist of precision reduction (by setting precision_reduce to false), the counterpart AI CPU operator supporting float32 is used. If the AI CPU operator does not support float32, an error is reported.
force_fp16 (default):
Indicates that float16 is forcibly selected if the operator precision in the original graph is float16, bfloat16, and float32.
allow_fp32_to_fp16:
- For matrix operators:
  - If the operator precision in the original graph is float32, the precision is preferably reduced to float16. If the operator in the AI Core does not support float16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.
  - If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
- For vector operators, the precision of the original graph is retained preferably.
  - If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
  - If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the precision is directly reduced to float16. If the operator in the AI Core does not support float16, the AI CPU operator is used for computation. If the AI CPU operator also does not support float16, an error is reported during execution.
must_keep_origin_dtype:
Retain the original precision.
- If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only float32 and bfloat16, the system automatically uses high-precision float32.
- If the precision of an operator in the original graph is float16, and the implementation of the operator in the AI Core does not support float16 but supports only bfloat16, the AI CPU operator of float16 is used. If the AI CPU operator is not supported, an error is reported.
- If the precision of an operator in the original graph is float32, and the implementation of the operator in the AI Core does not support float32 but supports only float16, the AI CPU operator of float32 is used. If the AI CPU operator is not supported, an error is reported.
allow_mix_precision/allow_mix_precision_fp16:
allow_mix_precision has the same effect as that of allow_mix_precision_fp16, indicating that mixed precision of float16, bfloat16, and float32 is used for neural network processing. allow_mix_precision_fp16 is newly added to the new version, which has clearer semantics for easy understanding.

For float32 and befloat16 operators in the original model, float16 is automatically used for certain float32 and bfloat16 operators based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation.

If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.
- If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 and bfloat16 to float16.
- If it is set to false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 and bfloat16 to float16. In this case, the operator still uses the precision of float32 or bfloat16.
- If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
allow_mix_precision_bf16:
Mixed precision of bfloat16 and float32 is used for neural network processing. In this mode, bfloat16 is automatically used for certain float32 operators on the original model based on the built-in tuning policy. This will improve system performance and reduce memory usage with minimal precision degradation. If the operator in the AI Core does not support bfloat16 and float32, the AI CPU operator is used for computation. If AI CPU operator also does not support bfloat16 and float32, an error is reported during execution.

If this mode is configured, you can view the value of precision_reduce in the built-in tuning policy file of ${INSTALL_DIR}/opp/built-in/op_impl/ai_core/tbe/config/xxx/aic-xxx-ops-info-*.json.
- If it is set to true, the operator is on the mixed precision trustlist and its precision will be reduced from float32 to bfloat16.
- If the field value is false, the operator is on the mixed precision blocklist and its precision will not be reduced from float32 to bfloat16.
- If an operator in the network model does not have the precision_reduce option configured, the operator is on the graylist and will follow the same precision processing as the upstream operator.
allow_fp32_to_bf16:
- If the operator precision in the original graph is float32, the precision of the original graph is preferably used. If the operator in the AI Core does not support float32, the precision is reduced to bfloat16. If the operator in the AI Core does not support bfloat16, the AI CPU operator is used for computation. If the AI CPU operator also does not support bfloat16, an error is reported during execution.
- If the operator precision in the original graph is bfloat16, the precision of the original graph is preferably used. If the operator in the AI Core does not support bfloat16, float32 is used. If the operator in the AI Core does not support float32, the AI CPU operator is used for computation. If the AI CPU operator also does not support float32, an error is reported during execution.

Replace ${INSTALL_DIR} with the CANN component directory. For example, if the installation is performed by the root user, the default file storage path is /usr/local/Ascend/cann. xxx varies depending on the product.

Restrictions:

The bfloat16 data type supports only the following products:
Atlas A2 training products / Atlas A2 inference products

Atlas A3 training products / Atlas A3 inference products

Atlas 200I/500 A2 inference products
For this option, performance takes priority for the default value and accuracy overflow issues may occur during subsequent inference. If an accuracy issue occurs during inference, locate the fault by referring to ""Accuracy Improvement Suggestions for Model Inference"".
If you want to avoid accuracy issues, you can set the option to a value other than the default one. For example, you can set the option to must_keep_origin_dtype.

Suggestions and Benefits

The accuracy and performance of the network model vary according to the configured precision mode.

Accuracy ranked from high to low: force_fp32 > must_keep_origin_dtype > allow_fp32_to_fp16 > allow_mix_precision > force_fp16

Performance ranked from high to low: force_fp16 >= allow_mix_precision > allow_fp32_to_fp16 > must_keep_origin_dtype > force_fp32

Example

--precision_mode=force_fp16

Restrictions

In the mixed precision scenario, if the inference performance deteriorates after the version upgrade, you are advised to use the AOE tool to perform optimization again. After the optimization is complete, use the --op_bank_path option to load the path of the customized knowledge base, and then convert the model again.

For details about operator tuning, see AOE Instructions.

Parent topic: Operator Tuning Options