Class Introduction

Function

Uses BM25 to perform topk retrieval on the input query. This class inherits langchain_core.retrievers.BaseRetriever and calls the invoke method of the base class to enable retrieval. The length of the input query cannot exceed 1 million.

Prototype

from mx_rag.retrievers.full_text_retriever import FullTextRetriever
FullTextRetriever(document_store, k)

Parameters

Parameter	Data Type	Required/Optional	Description
document_store	MilvusDocstore class or OpenGaussDocstore class	Required	Relational database instance of the document corpus to be retrieved. Currently, only MilvusDocstore and OpenGaussDocstore are supported.
k	Integer	Optional	Number of topk MxDocuments returned after retrieval. The value range is [1, 10000]. The default value is 1.
filter_dict	Dict	Optional	Dictionary consisting of retrieval criteria. Currently, only document IDs can be filtered. The filtered document IDs are passed in a list. The length of the ID list cannot exceed 1000 × 1000. The default value is {}. For example, if you need to filter the documents whose IDs are 1, 2, and 4, the input dictionary is {"document_id": [1, 2, 4]}.

Parameter

Data Type

Required/Optional

Description

document_store

MilvusDocstore class

or OpenGaussDocstore class

Required

Relational database instance of the document corpus to be retrieved. Currently, only MilvusDocstore and OpenGaussDocstore are supported.

Integer

Optional

Number of topk MxDocuments returned after retrieval. The value range is [1, 10000]. The default value is 1.

filter_dict

Dict

Optional

Dictionary consisting of retrieval criteria. Currently, only document IDs can be filtered. The filtered document IDs are passed in a list. The length of the ID list cannot exceed 1000 × 1000. The default value is {}. For example, if you need to filter the documents whose IDs are 1, 2, and 4, the input dictionary is {"document_id": [1, 2, 4]}.

Example

import getpass
from pymilvus import MilvusClient
from langchain_text_splitters import RecursiveCharacterTextSplitter
from mx_rag.document.loader import DocxLoader
from mx_rag.storage.document_store import MxDocument, MilvusDocstore
from mx_rag.retrievers import FullTextRetriever


client = MilvusClient("https://x.x.x.x:port", user="xxx", password=getpass.getpass(), secure=True, client_pem_path="path_to/client.pem",   client_key_path="path_to/client.key",   ca_pem_path="path_to/ca.pem",   server_name="localhost")

chunk_store = MilvusDocstore(client)
docs = DocxLoader("test.docx").load_and_split(
        RecursiveCharacterTextSplitter(chunk_size=750, chunk_overlap=100))
mxdocs = [MxDocument(page_content=doc.page_content, metadata=doc.metadata, document_name="text.docx") for doc in docs]
chunk_store.add(mxdocs, 1)
full_retrive = FullTextRetriever(document_store=chunk_store, k=3)
print(full_retrive.invoke ("Java"))

Parent topic: FullTextRetriever