根据需要,进行模型的离线推理或在线服务推理。
请先完成环境变量设置。若安装路径为默认路径,可以运行以下命令初始化各组件环境变量。
1 2 3 4 | # 配置CANN环境,默认安装在/usr/local目录下 source /usr/local/Ascend/ascend-toolkit/set_env.sh # 配置加速库环境 source /usr/local/Ascend/nnal/atb/set_env.sh |
请参考vLLM离线推理示例文档进行推理。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | from vllm import LLM, SamplingParams # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # Create an LLM. llm = LLM(model="/home/weight/qwen2.5-7b") # Generate texts from the prompts. The output is a list of RequestOutput objects # that contain the prompt, generated text, and other information. outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") |
“/home/weight/”为模型权重文件路径,请根据实际情况修改。
python test.py
1 2 3 4 | Prompt: 'Hello, my name is', Generated text: ' Daniel and I am an 8th grade student at York Middle School. I' Prompt: 'The president of the United States is', Generated text: ' the commander-in-chief of the armed forces. When a state of war exists,' Prompt: 'The capital of France is', Generated text: ' ____.\nA. Geneva\nB. Strasbourg\nC. Paris\nD' Prompt: 'The future of AI is', Generated text: " now\n\nThe future of AI is now\n\nIf you're like most tech professionals" |
vllm serve /home/weight/Qwen2.5-7B --dtype auto --port 8000
回显如下表明服务启动成功:
1 | INFO: Application startup complete. |
curl http://localhost:port/v1/completions \ #用户设置的推理服务侦听IP和Port,Port与服务端一致 -H "Content-Type: application/json" \ -d '{ "model": "/home/weight/qwen2.5-7b", "max_tokens": 25, "temperature": 0, "top_p": 0.9, "prompt": "The future of AI is" }'
1 | "choices":[{"index":0,"text":" here. It’s not just a buzzword or a concept anymore. It’s a reality that’s transforming the way we live,"logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":30,"completion_tokens":25,"prompt_tokens_details":null}} |