Similarity Cache

The similarity cache is a cache with semantic similarity matching. Its storage structure comprises SQLite and vector databases (Faiss, npu_faiss, Milvus).

During a query, the system first embeds the user's question, queries the top k similar results from a vector database, retrieves the cached answer and question from SQLite, re-ranks the cached question against the user's question, and returns the most similar result. The cache does not require an exact match and can be hit as long as semantic similarity is achieved.

Figure 1 Similarity cache structure