Case Study

Description

For the Faster R-CNN network, the default high-performance implementation mode and force_fp16 precision mode are retained for model conversion. However, the inference result is wrong, and the mAP result is 0.

Then, precision_mode=allow_fp32_to_fp16 is configured in the atc command line to generate a high-precision offline model. The inference accuracy with this new offline model is satisfactory.

Analysis

  1. During model conversion, retain the default high-performance mode and force_fp16 precision mode. Then, run inference on the new offline model to obtain dump data.
  2. During model re-conversion, set the high-precision mode (precision_mode=allow_fp32_to_fp16). Then, run inference again on the new offline model to obtain dump data.
  3. Use the Model Accuracy Analyzer to compare the dump data obtained in 1 and 2.

    The following is an example of the comparison result.

  4. The CosineSimilarity column shows the cosine similarity comparison result. The value range is [–1, +1]. A value closer to 1 indicates higher similarity. A value closer to –1 indicates lower similarity. For most operators, cosine similarity lower than 0.95 implies inaccuracy.

    As shown in the preceding figure, the cosine similarity of the AddN operator is as low as 0.72. Further analyze the dump data file of output 0 of the operator in high-precision mode (obtained in 2).

  5. As the dump data file cannot be directly opened by a text reader, first convert the dump data file into a NumPy file and then convert the NumPy file into a TXT file. For details, see ""Viewing Dump Files" " in Accuracy Debugging Tool Guide.

    When converting a NumPy file into a TXT file, you can obtain the maximum and minimum values of output 0 of the AddN operator. The following is a command example (****.npy indicates the path of the NumPy file):

    $ python3
    Python 3 (default, Mar  5 2020, 16:07:54)[GCC 5.4.0 20160609] on linuxType "help", "copyright", "credits" or "license" for more information.
    >>> import numpy as np
    >>> a = np.load("****.npy")
    >>> a.max()
    >>> 109508.0
    >>> a.min()
    >>> 70683.0
  6. According to 5, the maximum value of output tensor 0 of the AddN operator in high-precision mode is 109508.0, and that in high-performance mode (fp16) is 65504.0 as the range of values representable by fp16 is –65505 to +65504. Therefore, the output value of the AddN operator in high-precision mode is out of the representable range of fp16 and this operator must be implemented in high-precision mode. For details, see Preserving the Precision of Selected Operators.