Class Introduction

Function

Inherits langchain_core.document_loaders.base.BaseLoader and mx_rag.document.loader.BaseLoader class to parse markdown documents (.md/.markdown files). The file size cannot exceed 100 MB. The image and table information in files can be parsed, but a large vison model is required to parse images and summarize information. MarkdownLoader requires the NLTK tokenizer for its initial run. For security purposes, it is not automatically downloaded by default. If an error is reported, download the NLTK tokenizer and place it in the path specified by nltk.data.path.

Prototype

from mx_rag.document.loader import MarkdownLoader
MarkdownLoader(file_path, vlm, process_images_separately)

Parameters

Parameter

Data Type

Required/Optional

Description

file_path

String

Required

Markdown file path. The path length range is [1, 1024]. The path cannot be a soft link and cannot contain two consecutive dots (..). The size must be less than or equal to 100 MB.

vlm

Img2TextLLM

Optional

Large vision model's object, which can parse image information in a document to generate image summaries. For details, see Img2TextLLM.

process_images_separately

Bool

Optional

Whether to parse image information separately. If the value is True, image information is parsed separately to generate a Document object. The default value is False, indicating that image information is parsed together with other markdown content to generate a Document object.

Example

from mx_rag.document.loader import MarkdownLoader
from mx_rag.llm import Img2TextLLM, LLMParameterConfig
from mx_rag.utils import ClientParam

vlm = Img2TextLLM(base_url="https://{ip}:{port}/openai/v1/chat/completions",
                   model_name="Qwen2.5-VL-7B-Instruct",
                   llm_config=LLMParameterConfig(max_tokens=512),
                   client_param=ClientParam(ca_file="/path/to/ca.crt")
                   )
loader = MarkdownLoader("/path/to/document.md", vlm=vlm, process_images_separately=False)
docs = loader.lazy_load()
print(list(docs))