chat

Interacts with vision-language model (VLM) services to obtain the inference result of a VLM.

def chat(image_url, prompt, sys_messages, role, llm_config)

Parameter	Data Type	Required/Optional	Description
image_url	Dict	Required	Dictionary containing the Base64 code of an image. The key is url, and the value is a string with image_base64 as the variable. In the example {"url": f"data:image/jpeg;base64,{image_base64}"}, image_base64 indicates the Base64 code of an image. The length range is [1, 4 × 1024 × 1024].
sys_messages	List[dict]	Optional	System message. The list can contain up to 16 messages. Each dictionary within the list can have a maximum length of 16. The maximum length for dictionary key strings is 16, and the maximum length for value strings is 4 × 1024 × 1024. The default value is None.
role	String	Optional	Role of an inference request. The length range is [1, 16]. The default value is user.
llm_config	LLMParameterConfig	Optional	Parameters for calling an LLM. For details, see LLMParameterConfig.

Data Type	Description
String	Summary of the VLM's description of the image content.

Parent topic: Img2TextLLM