接口功能

提供文本/流式推理处理功能。

接口格式

操作类型：POST

URL：https://{ip}:{port}/v1/chat/completions

请求参数

参数	是否必选	说明	取值要求
model	必选	模型名。	由字母、数字、点、中划线和下划线组成，且不以点、中划线和下划线作为开头和结尾，字符串长度小于或等于256。
messages	必选	推理请求消息结构。	list类型，0KB<messages内容包含的字符数<=512KB，支持中英文。tokenizer之后的token数量<=(maxSeqLen-maxIterTimes)和max_position_embeddings之间的较小值（相关参数从配置文件中获取）。
role	必选	推理请求消息角色。	system：系统角色 user：用户角色 assistant：助手角色 tool：工具角色
content	必选	推理请求文本。	非空。
max_tokens	可选	允许推理生成的最大token个数。该字段受到配置文件maxIterTimes参数影响，推理token个数<=maxIterTimes。	uint_64类型，取值范围(0, maxIterTimes]。默认值maxIterTimes。
presence_penalty	可选	存在惩罚介于-2.0和2.0之间，它影响模型如何根据到目前为止是否出现在文本中来惩罚新token。正值将通过惩罚已经使用的词，增加模型谈论新主题的可能性。	float类型，取值范围[-2.0, 2.0]，默认值0.0。
frequency_penalty	可选	频率惩罚介于-2.0和2.0之间，它影响模型如何根据文本中词汇的现有频率惩罚新词汇。正值将通过惩罚已经频繁使用的词来降低模型一行中重复用词的可能性。	float类型，取值范围[-2.0, 2.0]，默认值0.0。
seed	可选	用于指定推理过程的随机种子，相同的seed值可以确保推理结果的可重现性，不同的seed值会提升推理结果的随机性。	uint_64类型，取值范围(0, 18446744073709551615]，不传递该参数，系统会产生一个随机seed值。
temperature	可选	控制生成的随机性，较高的值会产生更多样化的输出。	float类型，取值范围(0.0, 2.0]，默认值1.0。取值越大，结果的随机性越大。推荐使用大于或等于0.001的值，小于0.001可能会导致文本质量不佳。
top_p	可选	控制模型生成过程中考虑的词汇范围，使用累计概率选择候选词，直到累计概率超过给定的阈值。该参数也可以控制生成结果的多样性，它基于累积概率选择候选词，直到累计概率超过给定的阈值为止。	float类型，取值范围(0.0, 1.0]，默认值1.0。
stream	可选	指定返回结果是文本推理还是流式推理。	bool类型参数，默认值false。

运行环境的transformers版本不可低于4.34.0，低版本tokenizer不支持"chat_template"方法。
推理模型权重路径下的tokenizer_config.json需要包含"chat_template"字段及其实现。

使用样例

请求样例：

POST https://{ip}:{port}/v1/chat/completions

请求消息体：

单轮对话

{
    "model": "gpt-3.5-turbo",
    "messages": [{
        "role": "user",
        "content": "You are a helpful assistant."
    }],
    "max_tokens": 20,
    "presence_penalty": 1.03,
    "frequency_penalty": 1.0,
    "seed": null,
    "temperature": 0.5,
    "top_p": 0.95,
    "stream": false
}

多轮对话

{
    "model": "gpt-3.5-turbo",
    "messages": [{
        "role": "system",
        "content": "You are a student who is good at math."
        },
        {
        "role": "user",
        "content": "what is your hobby?"
        }
    ],
    "max_tokens": 20,
    "presence_penalty": 1.03,
    "frequency_penalty": 1.0,
    "seed": null,
    "temperature": 0.5,
    "top_p": 0.95,
    "stream": false
}

响应样例：

单轮对话与多轮对话的响应样例一致。

文本推理（“stream”=false）

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "model": "gpt-3.5-turbo-0613",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nHello there, how may I assist you today?"
            },
            "finish_reason": "eos_token"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

流式推理（“stream”=true，使用sse格式返回）

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"am"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" French"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"man"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" living"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" in"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" the"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" UK"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"."},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" am"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" keen"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" photograph"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":"er"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" and"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" I"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" have"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{"role":"assistant","content":" been"},"finish_reason":null}]}

data: {"id":"1","object":"chat.completion.chunk","created":1707275960,"model":"baichuan","choices":[{"index":0,"delta":{},"finish_reason":"length"}]}

data: [DONE]

输出说明

表1 文本推理结果说明
参数名	类型	说明
id	string	请求id。
object	string	返回结果类型目前都返回"chat.completion"。
created	integer	推理请求时间戳，精确到秒。
model	string	使用的推理模型。
choices	list	推理结果列表。
index	integer	choices消息index，当前只能为0。
message	object	推理消息。
role	string	角色，目前都返回"assistant"。
content	string	推理文本结果。
finish_reason	string	结束原因。 eos_token：请求正常结束。 stop_sequence：请求被主动CANCEL或STOP，用户不感知，丢弃响应。请求执行中出错，响应输出为空，err_msg非空。请求输入校验异常，响应输出为空，err_msg非空。 length：请求因达到最大序列长度而结束，响应为最后一轮迭代输出。请求因达到最大输出长度（包括请求和模型粒度）而结束，响应为最后一轮迭代输出。 invalid flag：无效标记。
usage	object	推理结果统计数据。
prompt_tokens	int	用户输入的prompt文本对应的token长度。
completion_tokens	int	推理token数量。
total_tokens	int	请求+推理总token数。

表2 流式推理结果说明
参数名	类型	说明
data	object	一次推理返回的结果。
id	string	请求id。
object	string	目前都返回"chat.completion.chunk"。
created	integer	推理请求时间戳，精确到秒。
model	string	使用的推理模型。
choices	list	流式推理结果。
finish_reason	string	结束原因，只在最后一次推理结果返回。 eos_token：请求正常结束。 stop_sequence：请求被主动CANCEL或STOP，用户不感知，丢弃响应。请求执行中出错，响应输出为空，err_msg非空。请求输入校验异常，响应输出为空，err_msg非空。 length：请求因达到最大序列长度而结束，响应为最后一轮迭代输出。请求因达到最大输出长度（包括请求和模型粒度）而结束，响应为最后一轮迭代输出。 invalid flag：无效标记。
index	integer	choices消息index，当前只能为0。
delta	object	推理返回结果，最后一个响应为空。
role	string	角色，目前都返回"assistant"。
content	string	推理文本结果。

推理接口

接口功能

接口格式

请求参数

使用样例

输出说明