Performance Fluctuates When the Number of Queries Exceeds 1000
Symptom
When the number of queries is greater than 100, the performance fluctuates.
Possible Cause
During concurrent CPU processing on the host, the CPU is scheduled to a non-affinity CPU core. As a result, the time consumption increases.
Solution
You need to bind cores to the retrieval application as follows.
- Obtain the NUMA node information. As shown in Figure 1, you can see that the queried NPU belongs to NUMA node 0.
- Run the lscpu command to view the CPU core information on NUMA node 0. As shown in Figure 2, the CPU core of NUMA node 0 is 0-13, 28-41.
- Bind the current retrieval application to the confirmed CPU.
taskset -c 0-13,28-41 ./mxIndexAppmxIndexApp indicates the retrieval application to be bound. Replace it with the actual application name.
Parent topic: FAQs About Inference Running

