Overview

MxRAGCache is developed based on the open source component GPTCache and supports the following cache functions:

Cache initialization
Cache update
Cache aging
Cache query
Cache cascading

Compared with GPTCache, MxRAGCache provides the following extended functions:

FAISS_NPU retrieval (Index SDK) supported by cache vector retrieval with semantic similarity
RAG-optimized TEI Embedding supported by cache embedding with semantic similarity
PAG-optimized TEI Reranker supported by cache similarity calculation with semantic similarity
RAG SDK chain cache supported (image-to-image, text-to-text, and text-to-image)

In the original RAG SDK process, the QA cache is introduced before knowledge document retrieval. If the query hits the cache, the inference process is skipped. This reduces the latency of knowledge document retrieval and inference, enhancing end-to-end performance. Compared to a cache miss, a cache hit improves performance by 50 times.

Parent topic: Cache Module