Development Workflow
Figure 1 shows the complete development process of RAG SDK. You can refer to the following content to call APIs.
In the running phase, execute related test cases as the HwHiAiUser user.
Knowledge base building and online QA can be performed concurrently. See the corresponding demo for implementation details.
- Build a knowledge base.
- Upload, load, and split domain-specific documents. And then, initialize the document processing tools. You can register the document parser (see Document Parsing, LangChain document parsing APIs, or LangChain-based custom APIs) and document splitter (see Langchain-based document splitting API or custom API) based on the type of the uploaded file. The supported document types include Docx, Excel, PDF, and PowerPoint. Finally, you can load parsing and splitting functions as required to output the document chunks after document splitting.
- Vectorize the text. Load the embedding model (see Vectorization) and set the parameters based on the model path. The document chunks generated after document splitting are vectorized and then stored in a vector database for knowledge base management.
- Refer to Documentation Management in a Knowledge Base to initialize knowledge base management, including initializing the relational database and vector database (see Relational Database and Vector Databases).
The split document chunks are stored in the relational database, and the vectorized chunks are stored in the vector database.
- Perform online QA.
- Initialize the cache (see Cache Module, which is optional). RAG SDK supports cache configuration and proximity searches. During QA, the system preferentially searches for the answer from the cache. If the answer is hit, the system directly returns the answer in the cache. If the cache is not configured or the question is not hit, the following inference process continues.
- Initialize the chain (see Model Chain). The chain is used to connect LLMs, and retrieval and re-ranking modules for QA. You can select chains such as text-to-text, text-to-image, and image-to-image. Multiple rounds of dialogs and parallel retrieval and inference are supported.
- Initialize the retrieval mode (see Retrieval). You can define the retrieval model, such as proximity retrieval and query rewriting. After a question is vectorized by the embedding model, the context can be found in the knowledge base through retrieval for further processing.
- Use the reranker to fine-tune the retrieved context to improve the retrieval quality (see Re-ranking, which is optional).
- Assemble the question and context into a prompt, pass the prompt to an LLM (see Large Language Models) for inference, and return the answer to the user. If a cache is configured, the QA pair is updated to the cache after QA is complete. When the answer is hit again, the QA time is shortened.
