查询TGI EndPoint信息

接口功能

查询TGI EndPoint信息。

最大程度兼容TGI接口返回格式，对于MindIE Server不支持的返回字段，返回null。

接口格式

操作类型：GET

URL：https://{ip}:{port}/info

请求参数

无

使用样例

请求样例：

GET https://<ip>:<port>/info

响应样例：

{
 "docker_label": null,
 "max_batch_total_tokens": 32000,
 "max_best_of": 1,
 "max_concurrent_requests": 300,
 "max_input_length": 1024, //maxSeqLen - maxIterTimes
 "max_stop_sequences": null,
 "max_waiting_tokens": null,
 "models": [{
   "model_device_type": "npu",
   "model_dtype": "torch.float16",
   "model_id": "bigscience/blomm-560m",//模型名称
   "model_pipeline_tag": "text-generation",
    "max_total_tokens": 2048,//取maxSeqLen的值
   "model_sha": null
  },
  {
   "model_device_type": "npu",
   "model_dtype": "torch.float16",
   "model_id": "bigscience/blomm-560m",
   "model_pipeline_tag": "text-generation",
    "max_total_tokens": 2048,//取maxSeqLen的值
   "model_sha": null
  }
 ],
 "sha": null,
 "validation_workers": null,
 "version": "{version}",
 "waiting_served_ratio": null
}

响应状态码：200

输出说明

参数	类型	说明
docker_label	string	暂不支持，默认返回null。
max_batch_total_tokens	int	建议取maxPrefillTokens。
max_best_of	int	暂不支持best_of参数，默认返回1，即每次只返回1个推理结果。
max_concurrent_requests	int	最大并发请求数，取maxBatchSize。
max_input_length	int	最大输入长度，取值maxSeqLen-maxIterTimes。
max_stop_sequences	int	暂不支持，默认返回null。
max_waiting_tokens	int	暂不支持，默认返回null。
models	list	模型配置。
model_device_type	string	模型运行设备类型，默认返回"npu"。
model_dtype	string	模型数据类型，读取权重配置文件目录config.json文件中的torch_dtype字段。
model_id	string	模型名称。
model_pipeline_tag	string	模型任务类型，默认返回"text-generation"。
max_total_token	string	最大推理token总数，读取maxSeqLen的值。
model_sha	string	暂不支持，默认返回null。
sha	string	暂不支持，默认返回null。
validation_workers	int	暂不支持，默认返回null。
version	string	"{version}"，版本号。
waiting_served_ratio	float	暂不支持，默认返回null。

父主题： 兼容TGI 0.9.4版本接口