Class Introduction

Function

Provides an openGauss-based vector database.

Prototype

from mx_rag.storage.vectorstore import OpenGaussDB
OpenGaussDB(engine, collection_name, search_mode, index_type, metric_type)

Parameters

Parameter

Data Type

Required/Optional

Description

engine

Engine

Required

Engine instance. For details, see Engine. The openGauss dialect is not allowed.

NOTE:

Engine is controlled by users. Use a secure connection mode.

collection_name

String

Optional

Collection name, which cannot be empty. The maximum length is 1024 characters. The value must be a valid Python identifier. The default value is vectorstore.

search_mode

SearchMode

Optional

Retrieval mode. Currently, three modes are supported: DENSE for dense retrieval (default), SPARSE for sparse retrieval, and HYBRID for hybrid retrieval.

For more details, see SearchMode.

index_type

String

Optional

Vector retrieval type. Currently, IVFFLAT and HNSW (default) are supported. This parameter is valid for dense vectors in dense and hybrid retrieval modes. HNSW is used for sparse vector retrieval and cannot be changed.

metric_type

String

Optional

Vector distance calculation mode, which can be IP (default), L2, and COSINE.

Return Value

Data Type

Description

OpenGaussDB

OpenGaussDB object.

Example

import getpass
import numpy as np
from mx_rag.storage.vectorstore import OpenGaussDB, SearchMode
from sqlalchemy import URL, create_engine

# OpenGauss
username = "demo"
password = getpass.getpass()
host = "<host here>"
port = "<port here>"
database = "testdb"

# vector config
dim = 128 
n_emb = 1000

url = URL.create(
   "opengauss+psycopg2",
   username=username,
   password=password,
   host=host,
   port=port,
   database=database
)
connect_args = {
    'sslmode': 'verify-full',
    'sslrootcert': "path_to root cert",
    'sslkey': "path_to key",
    'sslcert': "path_to cert",
    'sslpassword': getpass.getpass(prompt="cert key password:")
}

# create an engine
engine = create_engine(url, pool_size=20, max_overflow=10, pool_pre_ping=True, connect_args=connect_args) 
# search mode defaults to DENSE
# similarity strategy defaults to FLAT_IP
dense_store = OpenGaussDB.create(
    engine=engine,
    dense_dim=dim
)

# add vectors
dense_embeddings = np.random.randn(n_emb, dim)
ids = list(range(n_emb))
dense_store.add(ids, dense_embeddings)

# search vectors
res = dense_store.search(dense_embeddings[:3].tolist(), k=3)
print(res)

# delete vectors
count = dense_store.delete(ids)
print(count)

# update vector
dense_store.update([1], dense_embeddings[:1])

# drop table
dense_store.drop_collection()