Class Introduction
Function
This is the entry class for knowledge base management, which provides the document management function, including adding documents, deleting documents, and obtaining all documents from a knowledge base.
Prototype
from mx_rag.knowledge import KnowledgeDB KnowledgeDB(knowledge_store, chunk_store, vector_store, knowledge_name, white_paths, max_file_count, user_id, lock)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
knowledge_store |
KnowledgeStore |
Required |
Saves the names of uploaded documents for knowledge base management. For details about its data types, see KnowledgeStore. |
chunk_store |
Docstore |
Required |
Stores the document chunk list. For details about its data types, see Docstore. |
vector_store |
VectorStore |
Required |
Vector database storage object. For details about its data types, see VectorStore. |
knowledge_name |
String |
Required |
Knowledge base name, which can be customized based on the knowledge base theme. The length range is [1, 1024]. |
white_paths |
List[str] |
Required |
Trustlist of paths for uploading documents. The trustlist and path length ranges both are [1, 1024]. The path cannot be a soft link and cannot contain two consecutive dots (..). A file can be uploaded only when its file path is in the trustlist. |
max_file_count |
Integer |
Optional |
Maximum number of documents that can be uploaded. The value range is [1, 8000]. You are advised not to set this parameter to a large value. The default value is 1000. |
user_id |
String |
Required |
User ID, which is used to distinguish different knowledge bases and must comply with the regular expression ^[a-zA-Z0-9_-]{6,64}$. |
lock |
multiprocessing.synchronize.Lock or _thread.LockType |
Optional |
If multiple processes or threads are required, a lock needs to be allocated when this API is called. The default value is None. The values are as follows:
|
Data consistency must be ensured for chunk_store and vector_store. For example, relational database files and vector database files need to be generated at the same time.
Example
import pathlib
from paddle.base import libpaddle
from mx_rag.embedding.local import TextEmbedding
from mx_rag.knowledge import KnowledgeStore, KnowledgeDB
from mx_rag.storage.document_store import SQLiteDocstore
from mx_rag.storage.vectorstore import MindFAISS
# Set the NPU used for vector retrieval.
dev = 0
# Load the embedding model.
embed_func = TextEmbedding("/path/to/model", dev_id=dev)
# Initialize the vector database.
vector_store = MindFAISS(x_dim=1024, devs=[dev],
load_local_index="./faiss.index", auto_save=True)
# Initialize the relational database for document chunks.
chunk_store = SQLiteDocstore(db_path="./sql.db")
# Initialize the relational database for knowledge management.
knowledge_store = KnowledgeStore(db_path="./sql.db")
# Add a knowledge base and its administrator.
knowledge_store.add_knowledge(knowledge_name="test", user_id='Default', role='admin')
# Initialize knowledge management.
knowledge_db = KnowledgeDB(knowledge_store=knowledge_store, chunk_store=chunk_store, vector_store=vector_store,
knowledge_name="test", user_id="Default", white_paths=["/home/"])
file_path = pathlib.Path("./gaokao.txt")
knowledge_db.add_file(file=file_path,
texts=["test1", "test2"],
embed_func={"dense": embed_func.embed_documents},
metadatas=[{"source": "./gaokao.txt"}, {"source": "./gaokao.txt"}])
documents =[document.document_name for document in knowledge_db.get_all_documents()]
print(documents)
print(knowledge_db.check_document_exist(doc_name=file_path.name))
knowledge_db.delete_file(doc_name=file_path.name)
knowledge_db.delete_all()