Class Introduction

Function

Provides the Milvus-based knowledge base to store chunk information after splitting.

Prototype

from mx_rag.storage.document_store import MilvusDocstore
MilvusDocstore(client, collection_name, enable_bm25, bm25_k1, bm25_b, auto_flush)

Parameters

Parameter

Data Type

Required/Optional

Description

client

MilvusClient

Required

MilvusClient instance. For details, see MilvusClient.

NOTE:

MilvusClient is controlled by users. Use a secure connection mode.

collection_name

String

Optional

Collection name, which cannot be empty. The maximum length is 1024 characters. The default value is doc_store.

enable_bm25

Bool

Optional

Whether to enable BM25 retrieval. The default value is True. If this parameter is set to False, full-text retrieval is unavailable, that is, the full_text_search method always returns [ ].

bm25_k1

Float

Optional

Term frequency saturation during BM25 retrieval. A larger value indicates a higher importance of term frequency in document ranking. The value range is [1.2, 2.0]. The default value is 1.2. For details, see Milvus Full Text Search.

bm25_b

Float

Optional

Extent to which document length is normalized during BM25 retrieval. The value range is [0, 1]. The default value is 0.75. For details, see Milvus Full Text Search.

auto_flush

Bool

Optional

Whether to automatically update memory data during data changes. The default value is True.

encrypt_fn

Callable[[str], str]

Optional

Callback method. The return value is a string, and its length cannot exceed 128 × 1024 × 1024. This parameter is valid only when enable_bm25 is set to False. When add or update is called, encrypt_fn is used to encrypt page_content and then store the encrypted content to the database.

NOTICE:

If the file to be uploaded contains personal data such as bank account numbers, ID card numbers, passport numbers, and passwords, set this parameter to ensure personal data security.

decrypt_fn

Callable[[str], str]

Optional

Callback method. The return value is a string and its length cannot exceed 16 × 1024 × 1024. This parameter is valid only when enable_bm25 is set to False. When a query API is called, decrypt_fn is used to decrypt page_content and then return the decrypted data.

Example

import getpass
from pymilvus import MilvusClient
from mx_rag.storage.document_store import MxDocument, MilvusDocstore
client = MilvusClient("https://x.x.x.x:port", user="xxx", password=getpass.getpass(), secure=True,   client_pem_path="path_to/client.pem",   client_key_path="path_to/client.key",   ca_pem_path="path_to/ca.pem",   server_name="localhost")

chunk_store = MilvusDocstore(client)
text = ["Example", "Text"]
metadata_list = [{} for _ in text]
doc = [MxDocument(page_content=t, metadata=m, document_name="1.docx") for t, m in zip(text, metadata_list)]
document_id = 1
chunk_store.add(doc, document_id)
ids = chunk_store.get_all_chunk_id()
document = chunk_store.search(ids[0])
print(document.page_content)
print(chunk_store.full_text_search("Text", filter_dict={"document_id": [0]}))
print(chunk_store.full_text_search("Text", filter_dict={"document_id": [document_id]}))
chunk_store.update([0, 1], ["text1", "text2"])
print(chunk_store.delete(document_id))
chunk_store.search_by_document_id(document_id)