Class Introduction

Function

Interconnects with LLM parameters. The valid value of each parameter varies depending on model configurations.

Prototype

from mx_rag.llm import LLMParameterConfig
LLMParameterConfig(max_tokens, presence_penalty, frequency_penalty, temperature, top_p, seed, stream)

Parameters

Parameter	Data Type	Required/Optional	Description
max_tokens	Integer	Optional	Maximum number of tokens that can be generated for inference. The value range is [1, 100000], and the default value is 512. The value is passed by kwargs. The actual value depends on MindIE configurations. For details, see the description of maxSeqLen in "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.
presence_penalty	Float, integer	Optional	Affects how the model punishes new tokens based on whether they appear in the text. Positive values increase the probability that the model talks about new topics by punishing words that have been used. The value range is [-2.0, 2.0]. The default value is 0.0.
frequency_penalty	Float, Integer	Optional	Affects how the model punishes new tokens based on the existing frequency of tokens in the text. Positive values reduce the probability of repeated words in a row of the model by punishing words that have been frequently used. The value range is [-2.0, 2.0]. The default value is 0.0.
seed	Integer	Optional	Specifies the random seed of the inference process. The same seed value ensures the reproducibility of the inference result, and different seed values improve the randomness of the inference result. The value range is [0, 2 31 - 1]. If this parameter is not passed, the system generates a random seed. The default value is None**.
temperature	Float, integer	Optional	Controls the randomness of the output. A larger value indicates more diversified output. The value range is [0.0, 2.0], and the default value is 1.0.
top_p	Float, integer	Optional	Controls the vocabulary range considered during model generation and selects candidate words using the cumulative probability until it exceeds a given threshold. This parameter can also control the diversity of generated results. The value range is (0.0, 1.0] and the default value is 1.0.
stream	Bool	Optional	Specifies whether to enable streaming answering. The default value is False. This parameter takes effect in ParallelText2TextChain, SingleText2TextChain, and GraphRagText2TextChain.

Example

from mx_rag.llm import Text2TextLLM, LLMParameterConfig
from mx_rag.utils import ClientParam
llm = Text2TextLLM(base_url="https://{ip}:{port}/v1/chat/completions",
                   model_name="qianwen-7b",
                   llm_config=LLMParameterConfig(max_tokens=512),
                   client_param=ClientParam(ca_file="/path/to/ca.crt")
                   )
res = llm.chat("Please introduce Beijing.")
print(res)
for res in llm.chat_streamly("Please introduce Beijing."):
    print(res)

Parent topic: LLMParameterConfig