Class Introduction

Function

Uses BM25 to perform topk retrieval on the input query. This class inherits langchain_core.retrievers.BaseRetriever and calls the invoke method of the base class to enable retrieval. The length of the input query cannot exceed 1 million.

Prototype

from mx_rag.retrievers.full_text_retriever import FullTextRetriever
FullTextRetriever(document_store, k)

Parameters

Parameter

Data Type

Required/Optional

Description

document_store

MilvusDocstore class

or OpenGaussDocstore class

Required

Relational database instance of the document corpus to be retrieved. Currently, only MilvusDocstore and OpenGaussDocstore are supported.

k

Integer

Optional

Number of topk MxDocuments returned after retrieval. The value range is [1, 10000]. The default value is 1.

filter_dict

Dict

Optional

Dictionary consisting of retrieval criteria. Currently, only document IDs can be filtered. The filtered document IDs are passed in a list. The length of the ID list cannot exceed 1000 × 1000. The default value is {}. For example, if you need to filter the documents whose IDs are 1, 2, and 4, the input dictionary is {"document_id": [1, 2, 4]}.

Example

import getpass
from pymilvus import MilvusClient
from langchain_text_splitters import RecursiveCharacterTextSplitter
from mx_rag.document.loader import DocxLoader
from mx_rag.storage.document_store import MxDocument, MilvusDocstore
from mx_rag.retrievers import FullTextRetriever


client = MilvusClient("https://x.x.x.x:port", user="xxx", password=getpass.getpass(), secure=True, client_pem_path="path_to/client.pem",   client_key_path="path_to/client.key",   ca_pem_path="path_to/ca.pem",   server_name="localhost")

chunk_store = MilvusDocstore(client)
docs = DocxLoader("test.docx").load_and_split(
        RecursiveCharacterTextSplitter(chunk_size=750, chunk_overlap=100))
mxdocs = [MxDocument(page_content=doc.page_content, metadata=doc.metadata, document_name="text.docx") for doc in docs]
chunk_store.add(mxdocs, 1)
full_retrive = FullTextRetriever(document_store=chunk_store, k=3)
print(full_retrive.invoke ("Java"))