昇腾社区首页
中文
注册

类功能

功能描述

提供文档加载切分函数的管理功能。如果用户注册自定义的文本加载器,文本加载器需继承实现langchain_community.document_loaders.base.BaseLoader;如果用户注册自定义文本分割器,自定义的文本分割器需继承实现langchain_text_splitters.base.TextSplitter。

待解析的文档需UTF-8格式编码,否则可能解析失败。

函数原型

from mx_rag.document import LoaderMng
LoaderMng()

调用示例

from mx_rag.document.loader import ExcelLoader
from mx_rag.document import LoaderMng
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader_mng = LoaderMng()
# 调用register_loader
loader_mng.register_loader(ExcelLoader, [".xlsx"])
# 调用register_splitter
loader_mng.register_splitter(RecursiveCharacterTextSplitter, [".xlsx", ".docx"],
                             {"chunk_size": 4000, "chunk_overlap": 20, "keep_separator": False})
# 调用get_loader
loader_info = loader_mng.get_loader(".xlsx")
loader = loader_info.loader_class(file_path="/path/data/test.xlsx", **loader_info.loader_params)
# 调用get_splitter
splitter_info = loader_mng.get_splitter(".xlsx")
splitter = splitter_info.splitter_class(**splitter_info.splitter_params)
docs = loader.load_and_split(splitter)
print(docs)
# 调用unregister_loader
loader_mng.unregister_loader(ExcelLoader)
# 调用unregister_splitter
loader_mng.unregister_splitter(RecursiveCharacterTextSplitter)