类功能
功能描述
提供文档加载切分函数的管理功能。如果用户注册自定义的文本加载器,文本加载器需继承实现langchain_community.document_loaders.base.BaseLoader;如果用户注册自定义文本分割器,自定义的文本分割器需继承实现langchain_text_splitters.base.TextSplitter。
待解析的文档需UTF-8格式编码,否则可能解析失败。
函数原型
from mx_rag.document import LoaderMng LoaderMng()
调用示例
from mx_rag.document.loader import ExcelLoader from mx_rag.document import LoaderMng from langchain.text_splitter import RecursiveCharacterTextSplitter loader_mng = LoaderMng() # 调用register_loader loader_mng.register_loader(ExcelLoader, [".xlsx"]) # 调用register_splitter loader_mng.register_splitter(RecursiveCharacterTextSplitter, [".xlsx", ".docx"], {"chunk_size": 4000, "chunk_overlap": 20, "keep_separator": False}) # 调用get_loader loader_info = loader_mng.get_loader(".xlsx") loader = loader_info.loader_class(file_path="/path/data/test.xlsx", **loader_info.loader_params) # 调用get_splitter splitter_info = loader_mng.get_splitter(".xlsx") splitter = splitter_info.splitter_class(**splitter_info.splitter_params) docs = loader.load_and_split(splitter) print(docs) # 调用unregister_loader loader_mng.unregister_loader(ExcelLoader) # 调用unregister_splitter loader_mng.unregister_splitter(RecursiveCharacterTextSplitter)
父主题: LoaderMng类