What Do I Do If an Error Is Reported Due to a Large Data Volume in Quantization Calibration?

Symptom

Scenario 1: The size of accumulated data is too large and overflows.

The failure information is as follows:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Check failed: size <= tensorflow::kint32max (2684354560 vs. 2147483647)

Scenario 2: Memory/Video Memory Overflow

When AMCT is used, some extra memory or video RAM space is allocated. If the memory or video RAM resources are insufficient, allocation fails. As a result, the system displays a message indicating that resource allocation fails.

The following shows the common system resource allocation failure information:

In the CPU operating environment, the following information is displayed when memory allocation fails:

MemoryError: Unable to allocate array with shape (1048576, 102400) and data type float32

In the GPU operating environment, the following information is displayed when the video RAM fails to be allocated:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,1073741824] and type int32 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator G
         [[node TopKV2_1 (defined at test_input_big_tensor.py:30)  = TopKV2[T=DT_FLOAT, sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Cast_1/_5)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
         [[{{node QuantIfmr/_17}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:arnation=1, tensor_name="edge_55_QuantIfmr", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Scenario 3: The quantization process occupies too much memory. As a result, the system memory is insufficient, the "Out of memory" problem occurs, and the process is terminated by the system. In this case, an error message similar to "killed" in the following figure is displayed.

Run the following command to check the system log to find the "oom-kill" error message.

vi /var/log/messages

The error message in the system log is as follows:

kerne:  0 678327  python3  
Feb 27 03:30:38 kernel: [2599806.621086] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null) ,cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-0.slice/session-7474.scope,task=amct _tensorflow,pid=678157,uid=0
Feb 27 03:30:38 kernel: [2599809.519800] oom_reaper: reaped process 678157 (amct_tensorflow), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Solution

You are advised to reduce the batch_num value used for quantization. Due to different hardware resources, resource allocation may still fail after the batch_num value is reduced. You are advised to reduce the batch_num value selected for quantization and the size of each batch, or use a platform with more sufficient hardware resources for quantization.

If the problem persists, try the HFMG algorithm. Specifically, construct a simplified configuration file that contains the HFMG algorithm by following the instructions provided in Simplified PTQ Configuration File and then perform quantization again.

Parent topic: FAQ