split_text
Function
Implements the split_text method of the LangChain base class to split the input text string.
- If the text length is less than or equal to chunk_size, the text is not split.
- If the text length is greater than chunk_size, the splitting method of the parent class MarkdownHeaderTextSplitter is called to split the text according to header_level to generate multiple title chunks.
- If the title chunk length is less than or equal to chunk_size, subsequent title chunks are merged until the cumulative length is close to chunk_size or no further chunks are available. The resulting output is a consolidated title chunk combining titles and content of all involved chunks. Note that if two chunks contain different title content at the same hierarchical level, that level and all subsequent lower levels need to be discarded. Only the identical higher-level titles are retained as shared context.
- If the title chunk length is greater than chunk_size, the splitting method of the parent class RecursiveCharacterTextSplitter is called to recursively split the title chunk content. The resulting output is a combination of all titles and sub-title chunk content.
Note: The length of combined title chunk may be greater than chunk_size.
Prototype
def split_text(text)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
text |
String |
Required |
String to be split. The text length cannot exceed 100 MB. |
Return Value
Data Type |
Description |
|---|---|
List[str] |
List of split chunks |
Parent topic: MarkdownTextSplitter