QAGenerationConfig

Function

Generates a QA pair.

Prototype

from mx_rag.cache import QAGenerationConfig
QAGenerationConfig(titles, contents, tokenizer, llm, max_tokens, qas_num)

Parameters

Parameter

Data Type

Required/Optional

Description

titles

List[str]

Required

Title list. Each title corresponds to a content list. The list length range is [1, 10000], and the character string length range is [1, 100].

contents

List[str]

Required

Content list. Each title corresponds to a titl list. The list length range is [1, 10000], and the character string length range is [1, 1048576].

tokenizer

transformers.PreTrainedTokenizerBase

Required

Tokenizer instance, which is loaded through AutoTokenizer.from_pretrained. Loading an external model has security risks. Set local_files_only to True.

llm

Text2TextLLM

Required

LLM object instance. For details, see Text2TextLLM.

max_tokens

Integer

Optional

Maximum token size for truncating the content. The excessive part is discarded. The value range is [500, 100000], and the default value is 1000.

The actual value of this parameter depends on MindIE configurations. For details, see the description of maxSeqLen in "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.

qas_num

Integer

Optional

Number of generated QA pairs. The value range is [1, 10], and the default value is 5.

Example

from paddle.base import libpaddle
from transformers import AutoTokenizer
from mx_rag.cache import QAGenerationConfig, QAGenerate
from mx_rag.llm import Text2TextLLM
from mx_rag.utils import ClientParam
llm = Text2TextLLM(base_url="https://ip:port/v1/chat/completions", model_name="llama3-chinese-8b-chat",
                   client_param=ClientParam(ca_file="/path/to/ca.crt"))
# Use a model tokenizer and pass the model saving path.
tokenizer = AutoTokenizer.from_pretrained("/home/model/Llama3-8B-Chinese-Chat/", local_files_only=True)
# Call MarkDownParser to generate titles and contents.
titles = ["Composition test of the 2024 National College Entrance Examination"]
contents = ['composition test of the 2024 National College Entrance Examination\nNew Course Standard (I)\nRead the following materials and write a composition. (60 points)\n'
            'With the popularization of the Internet and artificial intelligence, more and more questions can be quickly answered. So, will we have fewer problems?\n'
            'How do you think about the above materials? Please write a composition no fewer than 800 words.'
            'Requirements: Select a proper angle and style to describe your opinions. Prepare your own title. Do not copy other articles, and do not disclose personal information.']
config = QAGenerationConfig(titles, contents, tokenizer, llm, qas_num=1)
qa_generate = QAGenerate(config)
qas = qa_generate.generate_qa()
print(qas)