Overview

MxRAGCache is developed based on the open source component GPTCache and supports the following cache functions:

  • Cache initialization
  • Cache update
  • Cache aging
  • Cache query
  • Cache cascading
Compared with GPTCache, MxRAGCache provides the following extended functions:
  • FAISS_NPU retrieval (Index SDK) supported by cache vector retrieval with semantic similarity
  • RAG-optimized TEI Embedding supported by cache embedding with semantic similarity
  • PAG-optimized TEI Reranker supported by cache similarity calculation with semantic similarity
  • RAG SDK chain cache supported (image-to-image, text-to-text, and text-to-image)

In the original RAG SDK process, the QA cache is introduced before knowledge document retrieval. If the query hits the cache, the inference process is skipped. This reduces the latency of knowledge document retrieval and inference, enhancing end-to-end performance. Compared to a cache miss, a cache hit improves performance by 50 times.