INT8Flat

The differences between INT8Flat and SQ8 are as follows: INT8 is quantized externally, and the input feature of the index is of the INT8 type. SQ8 is quantized internally, and the input feature of the index is of the Float32 type.

Usage

python3 int8flat_generate_model.py -d <dim> --cores <core_num> -p <process_id> -pool <pool_size> -t <npu_type> -code <code_num>

Parameter

<dim>: feature vector dimension. The default value is 512.

<core_num>: number of AI Cores of the Ascend AI Processor. The default value is 2. You do not need to configure this parameter.

<process_id>: ID of the process for multi-process scheduling of operators generated in batches. The default value is 0, and you do not need to set this parameter.

<pool_size>: size of the process pool for multi-process scheduling of operators generated in batches. The default value is 10.

<npu_type>: hardware form.
  • For the Atlas 200/300/500 inference product and Atlas inference product, run the npu-smi info command on the server where the AI processor is installed and then delete the last digit of Name. The obtained value is the value of npu_type.
  • For the Atlas 800I A2 inference server, run the npu-smi info command on the server where the AI processor is installed. The value of Name is the value of npu_type.

<code_num>: database block size when the operator is called. The default value is 262144. If this parameter is not set, operators with code_num values are generated by default.

--help | -h: help information.

Description

Run the command to obtain a group of operator model files. You need to modify the parameters in the command.

Restrictions

  • dim ∈ {64, 128, 256, 384, 512, 768, 1024}
  • 0 ≤ pool_size ≤ 32
  • code_num ∈ {16384, 32768, 65536, 131072, 262144}