Class Introduction
Function
Provides the openGauss knowledge base to store chunk information after splitting.
Prototype
from mx_rag.storage.document_store import OpenGaussDocstore OpenGaussDocstore(engine, encrypt_fn, decrypt_fn, enable_bm25, index_name)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
engine |
Engine |
Required |
Engine instance. For details, see Engine. The openGauss dialect is not allowed. NOTE:
Engine is controlled by users. Use a secure connection mode. |
encrypt_fn |
Callable[[str], str] |
Optional |
Callback method. The return value is a string and its length cannot exceed 128 × 1024 × 1024. It encrypts the chunk content of ChunkModel and outputs a string. When the add operation is performed, the data stored by the database are chunks processed by encrypt_fn. NOTICE:
If the file to be uploaded contains personal data such as bank account numbers, ID card numbers, passport numbers, and passwords, set this parameter to ensure personal data security. |
decrypt_fn |
Callable[[str], str] |
Optional |
Callback method. The return value is a string and its length cannot exceed 16 × 1024 × 1024. It decrypts the chunk content of ChunkModel and outputs a string. After the search operation is performed, the returned data are chunks processed by decrypt_fn. |
enable_bm25 |
Bool |
Optional |
Whether to enable BM25 retrieval. The default value is True. If this parameter is set to False, full-text retrieval is unavailable, that is, the full_text_search method always returns [ ]. |
index_name |
String |
Optional |
Name of the created BM25 retrieval, which must meet the regular expression ^[a-zA-Z0-9_-]{6,64}$. That is, the value can contain only letters, digits, and underscores (_), and the length ranges from 6 to 64 characters. The default value is chunks_content_bm25. |
Example
import getpass
from sqlalchemy import URL, create_engine
from mx_rag.storage.document_store import MxDocument, OpenGaussDocstore
def encrypt_fn(value):
# Secure encryption method
return value
def decrypt_fn(value):
# Secure decryption method
return value
username = "<username>"
host = "<host>"
port = "<port>"
database = "database"
url = URL.create(
"opengauss+psycopg2",
username=username,
password=getpass.getpass(),
host=host,
port=port,
database=database
)
connect_args = {
'sslmode': 'verify-full',
'sslrootcert': "path_to root cert",
'sslkey': "path_to key",
'sslcert': "path_to cert",
'sslpassword': getpass.getpass(prompt="cert key password:")
}
engine = create_engine(url, connect_args=connect_args)
chunk_store = OpenGaussDocstore(engine=engine, encrypt_fn=encrypt_fn, decrypt_fn=decrypt_fn)
texts = ["Example", "Text"]
metadatas = [{} for _ in texts]
doc = [MxDocument(page_content=t, metadata=m, document_name="1.docx") for t, m in zip(texts, metadatas)]
document_id = 1
chunk_store.add(doc, document_id)
idx = chunk_store.get_all_chunk_id()
document = chunk_store.search(idx[0])
print(document.page_content)
print(chunk_store.full_text_search("Text", filter_dict={"document_id": [0]}))
print(chunk_store.full_text_search("Text", filter_dict={"document_id": [document_id]}))
chunk_store.update(idx[:2], ["text1", "text2"])
print(chunk_store.delete(document_id))
chunk_store.search_by_document_id(document_id)