根据需要，进行模型的离线推理或在线服务推理。

前提条件

请先完成环境变量设置。若安装路径为默认路径，可以运行以下命令初始化各组件环境变量。

# 配置CANN环境，默认安装在/usr/local目录下
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 配置加速库环境
source /usr/local/Ascend/nnal/atb/set_env.sh

离线推理

请参考vLLM离线推理示例文档进行推理。

以Qwen2.5-7B模型为例，进入容器后在当前目录下新建一个py脚本（如test.py），如下所示：

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="/home/weight/qwen2.5-7b")

# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

“/home/weight/”为模型权重文件路径，请根据实际情况修改。

执行如下命令，执行推理。
```
python test.py
```

查看结果，回显如下则表明推理成功。

Prompt: 'Hello, my name is', Generated text: ' Daniel and I am an 8th grade student at York Middle School. I'
Prompt: 'The president of the United States is', Generated text: ' the commander-in-chief of the armed forces. When a state of war exists,'
Prompt: 'The capital of France is', Generated text: ' ____.\nA. Geneva\nB. Strasbourg\nC. Paris\nD'
Prompt: 'The future of AI is', Generated text: " now\n\nThe future of AI is now\n\nIf you're like most tech professionals"

启动在线服务推理

请根据vLLM在线服务示例文档启动服务。以Qwen2.5-7B简单示例，“/home/weight/”为模型权重文件路径，请根据实际情况修改：
```
vllm serve /home/weight/Qwen2.5-7B --dtype auto --port 8000
```
回显如下表明服务启动成功：
1
INFO: Application startup complete.

服务启动后，重开一个窗口，使用以下命令发送请求：

curl http://localhost:port/v1/completions \  #用户设置的推理服务侦听IP和Port，Port与服务端一致
  -H "Content-Type: application/json" \
  -d '{
    "model": "/home/weight/qwen2.5-7b",
    "max_tokens": 25,
    "temperature": 0,
    "top_p": 0.9,
    "prompt": "The future of AI is"
  }'

port端口号默认为“8000”。
“/home/weight/”为模型权重文件路径，请根据实际情况修改。

查看结果，回显如下则表明请求发送成功。

"choices":[{"index":0,"text":" here. It’s not just a buzzword or a concept anymore. It’s a reality that’s transforming the way we live,"logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":30,"completion_tokens":25,"prompt_tokens_details":null}}

模型推理

前提条件

离线推理

启动在线服务推理