Class Introduction
Function
This class establishes a client for interacting with vision model services and facilitating communication with them. Currently, it supports only the /openai/v1/chat/completions API of OpenAI. This class inherits and implements langchain.llms.base.LLM.
Prototype
from mx_rag.llm import Img2TextLLM # All parameters must be passed through keyword parameters. Img2TextLLM(base_url, prompt, model_name, llm_config, client_param)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
base_url |
String |
Required |
Model service address. The length range is [1, 128]. |
prompt |
String |
Optional |
Prompt, which is used to guide the vision foundation model to generate structured, detailed, and qualified image descriptions. The default value is IMG_TO_TEXT_PROMPT. You can also configure the value as required. The length range is [1, 1024 × 1024]. |
model_name |
String |
Required |
LLM name. The length range is [1, 128]. |
llm_config |
LLMParameterConfig |
Optional |
This parameter is valid when for LangChain-based call (LLMParameterConfig). For non-LangChain-based call, parameters are passed via chat (chat). |
client_param |
ClientParam |
Optional |
HTTPS client configuration parameter. The default value is ClientParam(). For details, see ClientParam. |
IMG_TO_TEXT_PROMPT = '''Given an image containing a table or figure, please provide a structured and detailed description in chinese with two levels of granularity: Coarse-grained Description: - Summarize the overall content and purpose of the image. - Briefly state what type of data or information is presented (e.g., comparison, trend, distribution). - Mention the main topic or message conveyed by the table or figure. Fine-grained Description: - Describe the specific details present in the image. - For tables: List the column and row headers, units, and any notable values, patterns, or anomalies. - For figures (e.g., plots, charts): Explain the axes, data series, legends, and any significant trends, outliers, or data points. - Note any labels, captions, or annotations included in the image. - Highlight specific examples or noteworthy details. Deliver the description in a clear, organized, and reader-friendly manner, using bullet points or paragraphs as appropriate, answer in chinese'''
Example
from mx_rag.llm import Img2TextLLM, LLMParameterConfig
from mx_rag.utils import ClientParam
from PIL import Image
import io
import base64
vlm = Img2TextLLM(base_url="https://{ip}:{port}/openai/v1/chat/completions",
model_name="Qwen2.5-VL-7B-Instruct",
llm_config=LLMParameterConfig(max_tokens=512),
client_param=ClientParam(ca_file="/path/to/ca.crt")
)
# Generate the Base64-encoded image.
with Image.open("/path/to/image.jpeg") as img:
buffer = io.BytesIO()
img.save(buffer, format="JPEG")
img_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
image_url = {"url": f"data:image/jpeg;base64,{img_base64}"}
res = vlm.chat(image_url=image_url)
print(res)