Class Introduction
Function
Uses BM25 to perform topk retrieval on the input query. This class inherits langchain_core.retrievers.BaseRetriever and calls the invoke method of the base class to enable retrieval. The length of the input query cannot exceed 1 million.
Prototype
from mx_rag.retrievers.full_text_retriever import FullTextRetriever FullTextRetriever(document_store, k)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
document_store |
MilvusDocstore class or OpenGaussDocstore class |
Required |
Relational database instance of the document corpus to be retrieved. Currently, only MilvusDocstore and OpenGaussDocstore are supported. |
k |
Integer |
Optional |
Number of topk MxDocuments returned after retrieval. The value range is [1, 10000]. The default value is 1. |
filter_dict |
Dict |
Optional |
Dictionary consisting of retrieval criteria. Currently, only document IDs can be filtered. The filtered document IDs are passed in a list. The length of the ID list cannot exceed 1000 × 1000. The default value is {}. For example, if you need to filter the documents whose IDs are 1, 2, and 4, the input dictionary is {"document_id": [1, 2, 4]}. |
Example
import getpass
from pymilvus import MilvusClient
from langchain_text_splitters import RecursiveCharacterTextSplitter
from mx_rag.document.loader import DocxLoader
from mx_rag.storage.document_store import MxDocument, MilvusDocstore
from mx_rag.retrievers import FullTextRetriever
client = MilvusClient("https://x.x.x.x:port", user="xxx", password=getpass.getpass(), secure=True, client_pem_path="path_to/client.pem", client_key_path="path_to/client.key", ca_pem_path="path_to/ca.pem", server_name="localhost")
chunk_store = MilvusDocstore(client)
docs = DocxLoader("test.docx").load_and_split(
RecursiveCharacterTextSplitter(chunk_size=750, chunk_overlap=100))
mxdocs = [MxDocument(page_content=doc.page_content, metadata=doc.metadata, document_name="text.docx") for doc in docs]
chunk_store.add(mxdocs, 1)
full_retrive = FullTextRetriever(document_store=chunk_store, k=3)
print(full_retrive.invoke ("Java"))