register_splitter
Function
Registers a document splitting function. A maximum of 1,000 splitters can be loaded.
Prototype
def register_splitter(splitter_class, file_types, splitter_params)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
splitter_class |
TextSplitter |
Required |
Document splitting function, which must be a subclass of TextSplitter inherited from LangChain. |
file_types |
List[str] |
Required |
File name extension list. The value ranges for both the document type and name extension length are [1, 32]. Files in .jpg and .png formats are not supported. For example, the value can be [".txt", ".docx"]. |
splitter_params |
Dict[str, Any] |
Optional |
Parameters to be passed to the document splitting function. The default value is None. The length of the parameter string cannot exceed 1024 characters. The dictionary length cannot exceed 1024 characters. The number of nested dictionary layers cannot exceed 2. Take LangChain as an example. When splitter_class is RecursiveCharacterTextSplitter, the input parameters of splitter_params include {"chunk_size": 4000, "chunk_overlap": 20, "keep_separator": False}. chunk_size defines the size of a split block, chunk_overlap defines the size of the overlapping part between split blocks, and keep_separator indicates whether to retain separators (defaults to ["\n\n", "\n", "", ""]). |