chat

Function

Interacts with vision-language model (VLM) services to obtain the inference result of a VLM.

Prototype

def chat(image_url, prompt, sys_messages, role, llm_config)

Parameters

Parameter

Data Type

Required/Optional

Description

image_url

Dict

Required

Dictionary containing the Base64 code of an image. The key is url, and the value is a string with image_base64 as the variable. In the example {"url": f"data:image/jpeg;base64,{image_base64}"}, image_base64 indicates the Base64 code of an image.

The length range is [1, 4 × 1024 × 1024].

sys_messages

List[dict]

Optional

System message. The list can contain up to 16 messages. Each dictionary within the list can have a maximum length of 16. The maximum length for dictionary key strings is 16, and the maximum length for value strings is 4 × 1024 × 1024. The default value is None.

role

String

Optional

Role of an inference request. The length range is [1, 16]. The default value is user.

llm_config

LLMParameterConfig

Optional

Parameters for calling an LLM. For details, see LLMParameterConfig.

Return Value

Data Type

Description

String

Summary of the VLM's description of the image content.