merge_text_summarize

Function

Due to the input token limitation of an LLM, a long text needs to be divided into multiple shorter texts. These shorter texts are then summarized to create sub-summaries, which are further combined and summarized again using an LLM. The final summary is achieved through multiple iterations, with a maximum of 10 iterations.

Prototype

def merge_text_summarize(texts, merge_threshold, not_summarize_threshold, prompt)

Parameters

Parameter

Data Type

Required/Optional

Description

texts

List[str]

Required

Text sub-summary list. The total length of all texts in the list falls within the range of (0, 1024 × 1024]. The list length range is (0, 1024].

merge_threshold

Integer

Optional

Due to the token limitations of an LLM, the sub-summary list must be divided and sent to the model for merging summaries. This parameter sets the splitting threshold to ensure that the total length of each divided list is less than or equal to the threshold. The default value is 4 × 1024. The value range is [1024, 1024 × 1024]. The value of merge_threshold must be greater than that of not_summarize_threshold.

not_summarize_threshold

Integer

Optional

For a given short text, an LLM may either fail to summarize it or produce an incorrect summary. This parameter specifies the text length threshold for an LLM to summarize the text. If the text length is less than or equal to the value of not_summarize_threshold, the model does not summarize the text, and the summary content remains the original text. The default value is 30, and the value range is (0, 1024 × 1024].

prompt

langchain_core.prompts.PromptTemplate

Optional

The value of input_variables in prompt must be ["text"]. The template length range is (0, 1024 × 1024]. The query of an LLM request is a prompt combined with a text. The valid value depends on MindIE configurations. For details, see the description of maxSeqLen in "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide. It is recommended that the language type of the prompt be the same as that of the text, or the language type of the LLM answer be specified. Otherwise, the answering performance will be affected.

PromptTemplate(

input_variables=["text"],

template="""Use simple Chinese to combine multiple abstracts into one abstract, including as much key information as possible. The output contains only content information.\n\n{text}""")

Return Value

Data Type

Description

String

Final summary after summary merging.