evaluate
Function
Performs evaluation on a given evaluation dataset based on data in dictionary format. To display the logs printed by RAGAS, set the environment variable DISABLE_RAGAS_LOGGING to 0.
Prototype
def evaluate(metrics, dataset, language, prompts_path, show_progress)
Parameters
Parameter |
Data Type |
Required/Optional |
Description |
|---|---|---|---|
metrics |
list[str] |
Required |
Set of evaluation metrics. For details, see Table 1. The number of metrics in the set is limited to (0, 14]. The number of characters in the name of each metric is within [1, 50]. The metrics must be unique. If the answer_similarity metric is used, the key of the returned score is semantic_similarity. |
dataset |
Dict[str, Any] |
Required |
User evaluation dataset with a length range of [1, 4]. The dictionary format is as follows:
The list lengths of user_input, response, and reference must be the same as the length of the outer list of retrieved_contexts. |
language |
String |
Optional |
Local language. If this parameter is specified, the specified language is used for evaluation. The default value is None. If this parameter is not set, the default prompt provided by RAGAS is used. The value can be chinese or english. |
prompts_path |
String |
Optional |
Localized prompt. If this parameter is specified, the system searches for the corresponding prompt file in the prompt_dir directory based on the set language. If the prompt file is found, the evaluation process can be accelerated. The size of each file in the directory cannot exceed 4 MB, the level cannot exceed 64, and the total number of files cannot exceed 512. The default value is None. The character string length is [1, 255]. |
show_progress |
Bool |
Optional |
Whether to display the progress bar during evaluation. By default, the progress bar is not displayed. |
Return Value
Data Type |
Description |
|---|---|
Optional[Dict[str, List[float]]] |
A dictionary is returned:
|