昇腾社区首页
中文
注册
开发者
下载

类功能

功能描述

建立客户端对接视觉大模型服务,提供大模型交互功能,当前只支持兼容OpenAI接口/openai/v1/chat/completions。该类继承实现了langchain.llms.base.LLM。

函数原型

from mx_rag.llm import Img2TextLLM
# 所有参数需通过关键字参数传递
Img2TextLLM(base_url, prompt, model_name, llm_config, client_param)

输入参数说明

参数名

数据类型

可选/必选

说明

base_url

str

必选

大模型服务地址。长度取值范围[1, 128]。

prompt

str

可选

提示词,用于指导视觉大模型生成结构化、详细且符合要求的图像描述,默认值为图像结构化描述提示,用户也可根据需求配置。长度范围[1, 1024 * 1024]

model_name

str

必选

LLM模型名称。长度取值范围[1, 128]。

llm_config

LLMParameterConfig

可选

通过langchain调用时生效,描述参见LLMParameterConfig类;非langchain方式调用通过chat方法传入参数,参见chat

client_param

ClientParam

可选

https客户端配置参数,默认值为“ClientParam()”,具体描述请参见ClientParam类

  • 图像结构化描述提示(IMG_TO_TEXT_PROMPT)
IMG_TO_TEXT_PROMPT = '''Given an image containing a table or figure, please provide a structured and detailed
description in chinese with two levels of granularity:

  Coarse-grained Description:
  - Summarize the overall content and purpose of the image.
  - Briefly state what type of data or information is presented (e.g., comparison, trend, distribution).
  - Mention the main topic or message conveyed by the table or figure.

  Fine-grained Description:
  - Describe the specific details present in the image.
  - For tables: List the column and row headers, units, and any notable values, patterns, or anomalies.
  - For figures (e.g., plots, charts): Explain the axes, data series, legends, and any significant trends, outliers,
  or data points.
  - Note any labels, captions, or annotations included in the image.
  - Highlight specific examples or noteworthy details.

  Deliver the description in a clear, organized, and reader-friendly manner, using bullet points or paragraphs
  as appropriate, answer in chinese'''

调用示例

from mx_rag.llm import Img2TextLLM, LLMParameterConfig
from mx_rag.utils import ClientParam
from PIL import Image
import io
import base64

vlm = Img2TextLLM(base_url="https://{ip}:{port}/openai/v1/chat/completions",
                   model_name="Qwen2.5-VL-7B-Instruct",
                   llm_config=LLMParameterConfig(max_tokens=512),
                   client_param=ClientParam(ca_file="/path/to/ca.crt")
                   )
# 生成图片base64编码
with Image.open("/path/to/image.jpeg") as img:
        buffer = io.BytesIO()
        img.save(buffer, format="JPEG")
        img_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')

image_url = {"url": f"data:image/jpeg;base64,{img_base64}"}
res = vlm.chat(image_url=image_url)
print(res)