Class Introduction

Function

Provides the openGauss knowledge base to store chunk information after splitting.

Prototype

from mx_rag.storage.document_store import OpenGaussDocstore
OpenGaussDocstore(engine, encrypt_fn, decrypt_fn, enable_bm25, index_name)

Parameters

Parameter

Data Type

Required/Optional

Description

engine

Engine

Required

Engine instance. For details, see Engine. The openGauss dialect is not allowed.

NOTE:

Engine is controlled by users. Use a secure connection mode.

encrypt_fn

Callable[[str], str]

Optional

Callback method. The return value is a string and its length cannot exceed 128 × 1024 × 1024. It encrypts the chunk content of ChunkModel and outputs a string. When the add operation is performed, the data stored by the database are chunks processed by encrypt_fn.

NOTICE:

If the file to be uploaded contains personal data such as bank account numbers, ID card numbers, passport numbers, and passwords, set this parameter to ensure personal data security.

decrypt_fn

Callable[[str], str]

Optional

Callback method. The return value is a string and its length cannot exceed 16 × 1024 × 1024. It decrypts the chunk content of ChunkModel and outputs a string. After the search operation is performed, the returned data are chunks processed by decrypt_fn.

enable_bm25

Bool

Optional

Whether to enable BM25 retrieval. The default value is True. If this parameter is set to False, full-text retrieval is unavailable, that is, the full_text_search method always returns [ ].

index_name

String

Optional

Name of the created BM25 retrieval, which must meet the regular expression ^[a-zA-Z0-9_-]{6,64}$. That is, the value can contain only letters, digits, and underscores (_), and the length ranges from 6 to 64 characters. The default value is chunks_content_bm25.

Example

import getpass
from sqlalchemy import URL, create_engine
from mx_rag.storage.document_store import MxDocument, OpenGaussDocstore
def encrypt_fn(value):
    # Secure encryption method
    return value
def decrypt_fn(value):
    # Secure decryption method
    return value
username = "<username>"

host = "<host>"
port = "<port>"
database = "database"
url = URL.create(
   "opengauss+psycopg2",
   username=username,
   password=getpass.getpass(),
   host=host,
   port=port,
   database=database
)
connect_args = {
    'sslmode': 'verify-full',
    'sslrootcert': "path_to root cert",
    'sslkey': "path_to key",
    'sslcert': "path_to cert",
    'sslpassword': getpass.getpass(prompt="cert key password:")
}
engine = create_engine(url, connect_args=connect_args)
chunk_store = OpenGaussDocstore(engine=engine, encrypt_fn=encrypt_fn, decrypt_fn=decrypt_fn)
texts = ["Example", "Text"]
metadatas = [{} for _ in texts]
doc = [MxDocument(page_content=t, metadata=m, document_name="1.docx") for t, m in zip(texts, metadatas)]
document_id = 1
chunk_store.add(doc, document_id)
idx = chunk_store.get_all_chunk_id()
document = chunk_store.search(idx[0])
print(document.page_content)
print(chunk_store.full_text_search("Text", filter_dict={"document_id": [0]}))
print(chunk_store.full_text_search("Text", filter_dict={"document_id": [document_id]}))
chunk_store.update(idx[:2], ["text1", "text2"])
print(chunk_store.delete(document_id))
chunk_store.search_by_document_id(document_id)