General Description

Sample Overview

This section uses an Atlas 800I A2 inference server as an example to describe how to use RAG SDK Python APIs to develop a QA system based on a knowledge base. Figure 1 shows the RAG SDK process, including two steps: knowledge base building and QA retrieval.

This example demonstrates a text-to-text scenario using the FLAT:L2 retrieval method. In the process diagram, [xxx] in each step denotes the applicable method class. The recommended LLM is Llama3-8B-Chinese-Chat, the embedding model is acge_text_embedding, and the reranker (optional) is bge-reranker-large.

Figure 1 Knowledge base-based QA process

Prerequisites

You have downloaded and run Llama3-8B-Chinese-Chat in the MindIE container. (You can download the model from here.)
You have completed containerized deployment on the host by referring to "Installing MindIE" > "Mode 3: Container Installation" in MindIE Installation Guide and started the service by referring to "Quick Start" > "Service Startup" in MindIE Motor Development Guide.
You have completed operations in Installing RAG SDK.
You have downloaded acge_text_embedding and bge-reranker-large and saved them to the model storage directory configured when the container is run in 2.a. Model download links:
- acge_text_embedding
- bge-reranker-large

TEI Serving Description

The embedding model and reranker can be run in serving mode. If you enable TEI serving, follow instructions in this link to run the embedding and reranker services.

Parent topic: FlatL2 Retrieval Mode