Class Introduction

Function

Extracts keywords from an input query via an LLM, and uses BM25 to perform top K retrieval. This class inherits langchain_core.retrievers.BaseRetriever and calls the invoke method of the base class to enable retrieval. The length of the input query cannot exceed 1 million characters.

Prototype

from mx_rag.retrievers.bm_retriever import BMRetriever
# All parameters must be passed through keyword parameters.
BMRetriever(docs, llm, k, llm_config, prompt, preprocess_func)

Parameters

Parameter

Data Type

Required/Optional

Description

docs

List[Document]

Required

List of documents to be retrieved. The value range is [0, 1000].

llm

Text2TextLLM

Required

LLM object instance. For details, see Text2TextLLM.

k

Integer

Optional

Top k results returned after retrieval. The value range is [1, 10000]. The default value is 1.

llm_config

LLMParameterConfig

Optional

Parameters for calling an LLM. Change the default value of temperature to 0.5 and that of top_p to 0.95. For details about other parameters, see LLMParameterConfig.

prompt

langchain_core.prompts.PromptTemplate

Optional

question is a character string with a fixed length, indicating the entered question. It cannot be changed and must be contained in prompt.input_variables. prompt.template indicates a prompt, and its length falls within the range of (0, 1 × 1024 × 1024. The query of an LLM request is a prompt combined with a question. The valid value depends on MindIE configurations. For details, see the description of maxSeqLen in "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide. It is recommended that the language type of the prompt be the same as that of the question, or the language type of the LLM answer be specified. Otherwise, the answer effect will be affected.

PromptTemplate(
input_variables=["question"],
template="""A maximum of 10 keywords can be extracted based on a question. Keywords should be divided into separate words such as verbs, nouns, or adjectives.
Do not use long phrases (to better match and retrieve semantically related materials with different expressions). Extract keywords based on the given reference. Use commas (,) to separate keywords, for example, {{keyword 1,keyword 2}}
Question: How do I install CANN?
Keywords: CANN, installation, install

Question: How do I create a MindStudio container image?
Keywords: MindStudio, container image, Docker build

Question: {question}
Keywords:
""")

preprocess_func

Callable[[str], List[str]]

Optional

Performs preprocessing before BM25 retrieval and splits the text string returned by an LLM to obtain the keyword list. By default, character strings are split using commas (,).

Example

from mx_rag.document.loader import DocxLoader
from mx_rag.chain import SingleText2TextChain
from mx_rag.llm import Text2TextLLM
from mx_rag.retrievers.bm_retriever import BMRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter
from mx_rag.utils import ClientParam
loader = DocxLoader("/path/to/MindIE.docx")
docs = loader.load_and_split(RecursiveCharacterTextSplitter(chunk_size=750, chunk_overlap=150))
client_param = ClientParam(ca_file="/path/to/ca.crt")
llm = Text2TextLLM(base_url="https://ip:port/v1/chat/completions", model_name="qianwen-7b", client_param = client_param)
bm_retriever = BMRetriever(docs=docs, llm=llm, k=10)
text2text_chain = SingleText2TextChain(llm=llm, retriever=bm_retriever)
res = text2text_chain.query("How do I install MindIE?")
print(res)