generate_origin_document
Function
Parses and splits the original document provided by users for synthesizing fine-tuning data.
Prototype
def generate_origin_document(document_path: str, loader_mng: LoaderMng, filter_func: Callable[[List[str]], List[str]])
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
document_path |
String |
Required |
Path where the original document is stored. The path length range is [1, 1024]. The path cannot contain soft links and cannot contain two consecutive dots (..). |
loader_mng |
LoaderMng |
Required |
Document parser and splitter. For details, see LoaderMng. |
filter_func |
Callable |
Optional |
Callback function for data cleaning on document chunks after parsing and splitting. The input and output parameters are both List[str]. The default value is None. |
Return Value
Data Type |
Description |
|---|---|
list[str] |
List of split document chunks |
Parent topic: Evaluation Data Synthesis