使用自研接口
文本/流式推理接口,将请求体中的stream参数改为false即为文本推理,改为true即为流式推理:
curl -H "Accept: application/json" -H "Content-type: application/json" --cacert ca.pem --cert client.pem --key client.key.pem -X POST -d '{
"inputs": "My name is Olivier and I",
"stream": true,
"parameters": {
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"do_sample": true,
"seed": null,
"repetition_penalty": 1.03,
"details": true
}
}' https://127.0.0.1:1025/infer
其他接口请参见自研接口章节
创建MindIE Client的方法请参见使用兼容Triton接口,之后可使用MindIE Client的Python接口来提前终止请求。
from utils import create_client
if __name__ == "__main__":
# get argument and create client
mindie_client = create_client()
# create input
prompt = "My name is Olivier and I"
model_name = "llama_65b"
parameters = {
"do_sample": True,
"temperature": 0.5,
"top_k": 10,
"top_p": 0.9,
"truncate": 5,
"typical_p": 0.9,
"seed": 1,
"repetition_penalty": 1,
"watermark": True,
"details": True,
}
# apply model inference
results = mindie_client.generate_stream(
model_name,
prompt,
request_id="1",
parameters=parameters,
)
# stop early
generated_text = ""
index = 0
for cur_res in results:
index += 1
if index == 10:
flag = mindie_client.cancel(model_name, "1")
if flag:
print("Test cancel api succeed!")
sys.exit(0)
else:
print("Test cancel api failed!")
sys.exit(1)
print("current result: %s", cur_res)
其他MindIE Client接口请参见class MindIEHTTPClient章节。
父主题: 使用接口说明