Class Introduction
Function
Class for parsing Markdown and returning the title and content.
Prototype
from mx_rag.cache import MarkDownParser MarkDownParser(file_path, max_file_num)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
file_path |
String |
Required |
Path of the folder where the markdown file is located. The path length cannot exceed 1024 characters. When the parse function is called, the following items are verified: The path cannot be a soft link or a relative path. The size of the markdown file in the folder path cannot exceed 10 MB, and the number of markdown files cannot exceed max_file_num. The path cannot be in the path list: ["/etc", "/usr/bin", "/usr/lib", "/usr/lib64", "/sys/", "/dev/", "/sbin", "/tmp"]. |
max_file_num |
Integer |
Optional |
Maximum number of markdown files that can be parsed. The default value is 1000. The value range is [1, 10000]. |
Return Value
Data Type |
Description |
|---|---|
Tuple[List[str], List[str]] |
Title and content lists of the parsed markdown file. |
Example
from paddle.base import libpaddle from mx_rag.cache import MarkDownParser dir_path = "path of .md document " parser = MarkDownParser(dir_path) titles, contents = parser.parse() print(titles) print(contents)