General Description
Sample Overview
This section uses an Atlas 800I A2 inference server as an example to describe how to use RAG SDK Python APIs to develop a QA system based on a knowledge base. Figure 1 shows the RAG SDK process, including two steps: knowledge base building and QA retrieval.
This example demonstrates a text-to-text scenario using the FLAT:L2 retrieval method. In the process diagram, [xxx] in each step denotes the applicable method class. The recommended LLM is Llama3-8B-Chinese-Chat, the embedding model is acge_text_embedding, and the reranker (optional) is bge-reranker-large.
Prerequisites
- You have downloaded and run Llama3-8B-Chinese-Chat in the MindIE container. (You can download the model from here.)
- You have completed containerized deployment on the host by referring to "Installing MindIE" > "Mode 3: Container Installation" in MindIE Installation Guide and started the service by referring to "Quick Start" > "Service Startup" in MindIE Motor Development Guide.
- You have completed operations in Installing RAG SDK.
- You have downloaded acge_text_embedding and bge-reranker-large and saved them to the model storage directory configured when the container is run in 2.a. Model download links:
Parent topic: FlatL2 Retrieval Mode
