Function Description
AscendIndexIVF is a base class that uses IVF-based indexes in FeatureRetrieval. It defines APIs for other IVF-based indexes in FeatureRetrieval.
For IVF algorithms, the linear performance increase in Atlas 300I Duo inference cards depends on the proportion of the calculation workload of distance calculation in the entire search process. Unlike other types of calculations, only distance calculations allow workload to be evenly distributed across multiple computation units. As a result, in scenarios involving large batches and nprobes, the linear scaling performance is superior. Conversely, when batch sizes and nprobes are smaller, this advantage diminishes.
The IVF algorithms must comply with the memory rules (nlist × 2 MB + resourceSize < NPU memory) to prevent memory allocation failures during program running. For example, if the NPU memory is 64 GB, the value of nlist must be less than 32768. If the value of nlist is greater than 32768, the memory required by program executed may exceed the NPU memory size. The reason is that memory allocation for retrieval is currently aligned to 2 MB granularity due to the huge-page-first policy. If any nlist bucket contains data, the memory allocated by the hardware is 2 MB-aligned. (resourceSize indicates the size of the shared memory specified in AscendIndexIVFConfig. Its default value is 128 MB.)