模型推理
根据需要,进行模型的离线推理或在线服务推理。
前提条件
请先完成环境变量设置。若安装路径为默认路径,可以运行以下命令初始化各组件环境变量。
1 2 3 4 | # 配置CANN环境,默认安装在/usr/local目录下 source /usr/local/Ascend/ascend-toolkit/set_env.sh # 配置加速库环境 source /usr/local/Ascend/nnal/atb/set_env.sh |
离线推理
- 以Qwen2.5-7B模型为例,进入容器后在当前目录下新建一个py脚本(如test.py),如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
from vllm import LLM, SamplingParams # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # Create an LLM. llm = LLM(model="/home/weight/qwen2.5-7b") # Generate texts from the prompts. The output is a list of RequestOutput objects # that contain the prompt, generated text, and other information. outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
“/home/weight/”为模型权重文件路径,请根据实际情况修改。
- 执行如下命令,执行推理。
python test.py
- 查看结果,回显如下则表明推理成功。
1 2 3 4
Prompt: 'Hello, my name is', Generated text: ' Daniel and I am an 8th grade student at York Middle School. I' Prompt: 'The president of the United States is', Generated text: ' the commander-in-chief of the armed forces. When a state of war exists,' Prompt: 'The capital of France is', Generated text: ' ____.\nA. Geneva\nB. Strasbourg\nC. Paris\nD' Prompt: 'The future of AI is', Generated text: " now\n\nThe future of AI is now\n\nIf you're like most tech professionals"
启动在线服务推理
- 请根据vLLM在线服务示例文档启动服务。以Qwen2.5-7B简单示例,“/home/weight/”为模型权重文件路径,请根据实际情况修改:
vllm serve /home/weight/Qwen2.5-7B --dtype auto --port 8000
回显如下表明服务启动成功:
1
INFO: Application startup complete.
- 服务启动后,重开一个窗口,使用以下命令发送请求:
curl http://localhost:port/v1/completions \ #用户设置的推理服务侦听IP和Port,Port与服务端一致 -H "Content-Type: application/json" \ -d '{ "model": "/home/weight/qwen2.5-7b", "max_tokens": 25, "temperature": 0, "top_p": 0.9, "prompt": "The future of AI is" }'
- port端口号默认为“8000”。
- “/home/weight/”为模型权重文件路径,请根据实际情况修改。
- 查看结果,回显如下则表明请求发送成功。
1
"choices":[{"index":0,"text":" here. It’s not just a buzzword or a concept anymore. It’s a reality that’s transforming the way we live,"logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":30,"completion_tokens":25,"prompt_tokens_details":null}}
父主题: vLLM文本生成推理快速入门