QAGenerationConfig

Function

Generates a QA pair.

Prototype

from mx_rag.cache import QAGenerationConfig
QAGenerationConfig(titles, contents, tokenizer, llm, max_tokens, qas_num)

Parameters

Parameter	Data Type	Required/Optional	Description
titles	List[str]	Required	Title list. Each title corresponds to a content list. The list length range is [1, 10000], and the character string length range is [1, 100].
contents	List[str]	Required	Content list. Each title corresponds to a titl list. The list length range is [1, 10000], and the character string length range is [1, 1048576].
tokenizer	transformers.PreTrainedTokenizerBase	Required	Tokenizer instance, which is loaded through AutoTokenizer.from_pretrained. Loading an external model has security risks. Set local_files_only to True.
llm	Text2TextLLM	Required	LLM object instance. For details, see Text2TextLLM.
max_tokens	Integer	Optional	Maximum token size for truncating the content. The excessive part is discarded. The value range is [500, 100000], and the default value is 1000. The actual value of this parameter depends on MindIE configurations. For details, see the description of maxSeqLen in "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.
qas_num	Integer	Optional	Number of generated QA pairs. The value range is [1, 10], and the default value is 5.

Example

from paddle.base import libpaddle
from transformers import AutoTokenizer
from mx_rag.cache import QAGenerationConfig, QAGenerate
from mx_rag.llm import Text2TextLLM
from mx_rag.utils import ClientParam
llm = Text2TextLLM(base_url="https://ip:port/v1/chat/completions", model_name="llama3-chinese-8b-chat",
                   client_param=ClientParam(ca_file="/path/to/ca.crt"))
# Use a model tokenizer and pass the model saving path.
tokenizer = AutoTokenizer.from_pretrained("/home/model/Llama3-8B-Chinese-Chat/", local_files_only=True)
# Call MarkDownParser to generate titles and contents.
titles = ["Composition test of the 2024 National College Entrance Examination"]
contents = ['composition test of the 2024 National College Entrance Examination\nNew Course Standard (I)\nRead the following materials and write a composition. (60 points)\n'
            'With the popularization of the Internet and artificial intelligence, more and more questions can be quickly answered. So, will we have fewer problems?\n'
            'How do you think about the above materials? Please write a composition no fewer than 800 words.'
            'Requirements: Select a proper angle and style to describe your opinions. Prepare your own title. Do not copy other articles, and do not disclose personal information.']
config = QAGenerationConfig(titles, contents, tokenizer, llm, qas_num=1)
qa_generate = QAGenerate(config)
qas = qa_generate.generate_qa()
print(qas)

Parent topic: Automatically Generated QA as Cache