以Qwen2-VL模型为例,执行服务化推理步骤如下:
docker exec -it {docker_name} bash
# 配置CANN环境,默认安装在/usr/local目录下 source /usr/local/Ascend/ascend-toolkit/set_env.sh # 配置加速库环境 source /usr/local/Ascend/nnal/atb/set_env.sh # 配置MindIE环境变量 source /usr/local/Ascend/mindie/latest/mindie-service/set_env.sh # 配置模型仓环境变量 source /usr/local/Ascend/atb-models/set_env.sh
pip install -r /usr/local/Ascend/atb-models/requirements/models/requirements_{model}.txt
vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
请重点关注并根据实际情况,修改配置文件中的参数(加粗),参数说明请参见配置参数说明。
“httpsEnabled”表示是否开启HTTPS协议。设为“True”表示开启HTTPS协议,此时需要配置双向认证证书;设为“False”表示不开启HTTPS协议。推荐开启HTTPS协议,并按照《MindIE Service开发指南》中“MindIE Service组件 > MindIE Service Tools > CertTools”章节,配置开启HTTPS通信所需服务证书、私钥等证书文件。
配置文件config.json示例如下:
{ ... "ServerConfig" : { ... "port" : 1040, "managementPort" : 1041, ... "httpsEnabled" : true, ... }, "BackendConfig": { ... "npuDeviceIds" : [[3]], ... "ModelDeployConfig": { ... "ModelConfig" : [ { "modelInstanceType": "Standard", "modelName" : "qwen2-vl", "modelWeightPath": "/data/datasets/qwen-2vl/", "worldSize" : 1, ... "npuMemSize" : 8, ... "trustRemoteCode": true, } ] }, ... } }
cd /usr/local/Ascend/mindie/latest/mindie-service/bin ./mindieservice_daemon
POST https://{ip_address}:{port}/v1/chat/completions { "model": "{model_name}", "messages": [ {"role": "system", "content":[{"type": "text", "text": "You are a helpful assistant"}]}, {"role": "user", "content": [{"type": "image_url", "image_url": "demo.png"}, {"type": "text", "text": "Generate the caption in English:"}]}], "max_tokens": 512, "presence_penalty": 1, "frequency_penalty": 1, "temperature": 1, "top_p": 1, "top_k": 0.0001 }
“model”和“image_url”参数的取值请根据实际情况进行修改。