Class Introduction

Function

Manages document loading and splitting functions. If you register a custom text loader, it needs to inherit and implement the function of langchain_core.document_loaders.base.BaseLoader. Similarly, a custom text splitter needs to inherit and implement the function of langchain_text_splitters.base.TextSplitter.

The document to be parsed must be in UTF-8 format. Otherwise, the parsing may fail.

Prototype

from mx_rag.document import LoaderMng
LoaderMng()

Example

from mx_rag.document.loader import ExcelLoader
from mx_rag.document import LoaderMng
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader_mng = LoaderMng()
# Call register_loader.
loader_mng.register_loader(ExcelLoader, [".xlsx"])
# Call register_splitter.
loader_mng.register_splitter(RecursiveCharacterTextSplitter, [".xlsx", ".docx"],
                             {"chunk_size": 4000, "chunk_overlap": 20, "keep_separator": False})
# Call get_loader.
loader_info = loader_mng.get_loader(".xlsx")
loader = loader_info.loader_class(file_path="/path/data/test.xlsx", **loader_info.loader_params)
# Call get_splitter.
splitter_info = loader_mng.get_splitter(".xlsx")
splitter = splitter_info.splitter_class(**splitter_info.splitter_params)
docs = loader.load_and_split(splitter)
print(docs)
# Call unregister_loader.
loader_mng.unregister_loader(ExcelLoader)
# Call unregister_splitter.
loader_mng.unregister_splitter(RecursiveCharacterTextSplitter)