类功能
功能描述
通过排序模型计算question(总结长文本的指令)和context(总结长文本的指令)切片之间的相关性得分,根据设定的压缩率阈值,优先保留相关性高的切片,从而实现对长文本的有效压缩。
函数原型
from mx_rag.compress.rerank_compressor import RerankCompressor class RerankCompressor(reranker, splitter)
输入参数说明
参数名 |
数据类型 |
可选/必选 |
说明 |
---|---|---|---|
reranker |
Reranker |
必选 |
排序模型实例,实现对文本切片进行精排,只能为mx_rag.reranker的Reranker对象,具体可参见Reranker。 |
splitter |
TextSplitter |
可选 |
文档切分函数,只能为继承自langchain的TextSplitter的子类。默认为langchain.text_splitter的RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0, separators=["\n", ""], keep_separator=True) |
调用示例
from mx_rag.compress.rerank_compressor import RerankCompressor from mx_rag.reranker.local import LocalReranker from mx_rag.reranker.service import TEIReranker from langchain.text_splitter import RecursiveCharacterTextSplitter from mx_rag.utils import ClientParam context="""需要压缩的prompt文本""" question="请给上述内容起一个标题" tei_reranker=False if tei_reranker: reranker = TEIReranker.create(url="https://ip:port/rerank", client_param=ClientParam(ca_file="/path/to/ca.crt")) else: reranker = LocalReranker(model_path="reranker_path", dev_id=0) text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0, separators=["\n", ""], keep_separator=True) compressor=RerankCompressor(reranker=reranker, splitter=text_splitter) res=compressor.compress_texts(context, question, 0.3) print(res)
父主题: RerankCompressor类