Class Introduction

Function

Uses a reranker to calculate the relevance score between the question and context (both specifying prompt for summarizing a long text) chunks and preferentially retains chunks with better relevance based on the preset compression rate to effectively compress long texts.

Prototype

from mx_rag.compress.rerank_compressor import RerankCompressor
class RerankCompressor(reranker, splitter)

Parameters

Parameter

Data Type

Required/Optional

Description

reranker

Reranker

Required

Reranker instance to re-rank document chunks, which can only be the Reranker object of mx_rag.reranker. For details, see Reranker.

splitter

TextSplitter

Optional

Document splitting function, which must be a subclass of TextSplitter inherited from LangChain. The default value is RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0, separators=["\n", ""], keep_separator=True) of langchain.text_splitter.

Example

from mx_rag.compress.rerank_compressor import RerankCompressor
from mx_rag.reranker.local import LocalReranker
from mx_rag.reranker.service import TEIReranker
from langchain.text_splitter import RecursiveCharacterTextSplitter
from mx_rag.utils import ClientParam

context="""Prompt text to be compressed"""
question="Provide a title for the above content."
tei_reranker=False
if tei_reranker:
    reranker = TEIReranker.create(url="https://ip:port/rerank",
                            client_param=ClientParam(ca_file="/path/to/ca.crt"))
else:
    reranker = LocalReranker(model_path="reranker_path", dev_id=0)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0, separators=["\n", ""], keep_separator=True)
compressor=RerankCompressor(reranker=reranker, splitter=text_splitter)
res=compressor.compress_texts(context, question, 0.3)
print(res)