ComputeDistanceByThreshold

API definition

APP_ERROR ComputeDistanceByThreshold(int n, const float16_t *queries, float threshold, int *num, idx_t *indices, float *distances, unsigned int tableLen = 0, const float *table = nullptr);

Function

Adds threshold filtering on the basis of ComputeDistance and returns only the distance that meets the threshold conditions. If a valid mapping table (tableLen > 0 and *table is a non-null pointer) is transferred, the distances are obtained after mapping and threshold filtering.

Input

int n: number of feature vectors to be queried.

float16_t *queries: feature vectors to be queried. The length is n × dim (vector dimension).

float threshold: threshold for filtering. The API does not restrict the value range. If a table is passed for mapping, the API maps the distance to score and then filters it based on threshold.

unsigned int tableLen: length of the mapping table. The default value is 0, indicating that no mapping is performed. Currently, the mapping table length can be set to 10000.

const float *table: pointer of the mapping table, pointing to the storage space of valid mapping values with the tableLen length. Currently, the supported redundancy length is 48, that is, the length of the space to which *table points is 10048 × sizeof(float) bytes.

Output

int *num: number of base vectors corresponding to each query vector and meeting the threshold condition. The length is n.

idx_t *indices: vector index that meets the thresholds. The number of indexes for each query that meets the thresholds varies. After all valid indexes are recorded in sequence, the occupied space is padded by ntotalPad. The total length of indices is n × ntotalPad (ntotalPad = (ntotal + 15)/16 x 16, that is, a 16-padded value by ntotal).

float *distances: distance between the base vector that meets the threshold condition and the query vector. The recording mode and space size of valid values are the same as those of indices.

Return value

APP_ERROR: return status. For details, see Return Code Reference.

Restrictions

  • n: The value range is [0, capacity].
  • indices: The space length to be provided is n × ntotalPad. (ntotalPad = (ntotal + 15)/16 x 16, that is, a 16-padded value by ntotal. After the ith query is filtered, the valid indexes are stored in the first *(num + i) space of ntotalPad. The padded value does not provide any significance.)
  • distances: The length of the space to be provided is n × ntotalPad.
  • aclrtmalloc is recommended for indices and distances, which can allocate the total physical memory to optimize delay processing.
  • If both tableLen and table meet the requirements during the parameter pass, the API maps the calculated distance.

    First, the distance value is normalized to a floating point number f1 in the range of [0, 1], and then f1 is multiplied by tableLen and rounded up to obtain an integer index between [0, tableLen]. Then, the integer index is used as an offset to obtain the corresponding score from the memory space pointed to by table. The mapping is completed, and score is saved to distance.

    The index mapping formula is as follows: ((CosDistance + 1)/2) × tableLen

  • indices, queries, distances, and num must be non-null pointers and their lengths must meet the requirements. Otherwise, an out-of-bounds read/write error may occur, causing program breakdown.