Batch Size Tuning

Background

For some small operators, increasing the batch size can reduce the proportion of header overhead. In addition, the number of cores occupied by small operators may increase to improve the overall performance. In this way, the computing resources of the NPU core can be better utilized and repeated weight transfer can be reduced.

Restrictions

  1. Increasing the batch size increases the number of cores required for operator computing. When the batch size is increased and the core division function is used at the same time, the latency deteriorates seriously. Therefore, the optimal batch size needs to be verified based on the specific latency.
  2. The cache hit ratio decreases.
  3. Generally, the batch size is determined by the customer's service. Increasing the batch size can test the performance in the local pressure test script. When the batch size is not fixed and padding may occur, the optimal performance is not as good as that on the local host.

Cases

The following table lists the actual model data. The throughput increases with the batch size.

Table 1 Model data

batchsize

Inference Duration (ms)

Throughput

86

8.5

10117

128

10.4

12258

256

14.9

17231

512

26.7

19195