Class Introduction

Function

Interconnects with LLM parameters. The valid value of each parameter varies depending on model configurations.

Prototype

from mx_rag.llm import LLMParameterConfig
LLMParameterConfig(max_tokens, presence_penalty, frequency_penalty, temperature, top_p, seed, stream)

Parameters

Parameter

Data Type

Required/Optional

Description

max_tokens

Integer

Optional

Maximum number of tokens that can be generated for inference. The value range is [1, 100000], and the default value is 512. The value is passed by kwargs. The actual value depends on MindIE configurations. For details, see the description of maxSeqLen in "Core Concepts and Configurations" > "Configuration Parameters (Serving)" in MindIE LLM Development Guide.

presence_penalty

Float, integer

Optional

Affects how the model punishes new tokens based on whether they appear in the text. Positive values increase the probability that the model talks about new topics by punishing words that have been used.

The value range is [-2.0, 2.0]. The default value is 0.0.

frequency_penalty

Float, Integer

Optional

Affects how the model punishes new tokens based on the existing frequency of tokens in the text. Positive values reduce the probability of repeated words in a row of the model by punishing words that have been frequently used.

The value range is [-2.0, 2.0]. The default value is 0.0.

seed

Integer

Optional

Specifies the random seed of the inference process. The same seed value ensures the reproducibility of the inference result, and different seed values improve the randomness of the inference result. The value range is [0, 2 ** 31 - 1]. If this parameter is not passed, the system generates a random seed. The default value is None.

temperature

Float, integer

Optional

Controls the randomness of the output. A larger value indicates more diversified output. The value range is [0.0, 2.0], and the default value is 1.0.

top_p

Float, integer

Optional

Controls the vocabulary range considered during model generation and selects candidate words using the cumulative probability until it exceeds a given threshold. This parameter can also control the diversity of generated results.

The value range is (0.0, 1.0] and the default value is 1.0.

stream

Bool

Optional

Specifies whether to enable streaming answering. The default value is False. This parameter takes effect in ParallelText2TextChain, SingleText2TextChain, and GraphRagText2TextChain.

Example

from mx_rag.llm import Text2TextLLM, LLMParameterConfig
from mx_rag.utils import ClientParam
llm = Text2TextLLM(base_url="https://{ip}:{port}/v1/chat/completions",
                   model_name="qianwen-7b",
                   llm_config=LLMParameterConfig(max_tokens=512),
                   client_param=ClientParam(ca_file="/path/to/ca.crt")
                   )
res = llm.chat("Please introduce Beijing.")
print(res)
for res in llm.chat_streamly("Please introduce Beijing."):
    print(res)