开发者
资源
[object Object][object Object]

环境准备

请参考进行环境的安装与部署,并参考根据用户需要配置参数。

操作步骤

  • Server可以部署兼容Triton/OpenAI/TGI/vLLM第三方框架接口的服务应用。推荐用户开启HTTPS通信,并按照,配置开启HTTPS通信所需服务证书、私钥等证书文件。

  • Server启动的默认IP地址和端口号为[object Object],用户可修改config.json文件中的"ipAddress"和"port"参数来配置启动IP地址与端口号。

  • Server可实现服务状态查询,模型信息查询,文本/流式推理等功能。

[object Object]
  1. 两种启动服务方法如下所示。

    启动命令需在 {MindIE安装目录} 目录中执行,使用以下命令查看安装路径。

    [object Object]
    [object Object]
    • 方式一(推荐):使用后台进程方式启动服务。后台进程方式启动服务后,关闭窗口后进程也会保留。

      [object Object]

      在标准输出流捕获到的文件中,打印如下信息说明启动成功。

      [object Object]
    • 方式二:直接启动服务。

      [object Object]

      回显如下则说明启动成功。

      [object Object]
    [object Object]
  2. 用户可使用HTTPS客户端(Linux curl命令,Postman工具等)发送HTTPS请求,此处以Linux curl命令为例进行说明。

    重新打开一个窗口,使用以下命令发送请求。例如列出当前模型列表:

    [object Object]
    [object Object]
[object Object][object Object]

本章节以v1/chat流式推理接口和v1/completions流式推理接口为例介绍接口调用,其他接口的调用方法请参见章节。

1. v1/chat流式推理接口

[object Object]

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","choices":[{"index":0,"delta":{"role":"assistant","content":" are"},"logprobs":null,"finish_reason":null}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","choices":[{"index":1,"delta":{"role":"assistant","content":" are"},"logprobs":null,"finish_reason":null}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","choices":[{"index":0,"delta":{"role":"assistant","content":" a"},"logprobs":null,"finish_reason":null}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","choices":[{"index":1,"delta":{"role":"assistant","content":" a"},"logprobs":null,"finish_reason":null}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","choices":[{"index":0,"delta":{"role":"assistant","content":" helpful"},"logprobs":null,"finish_reason":null}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","choices":[{"index":1,"delta":{"role":"assistant","content":" helpful"},"logprobs":null,"finish_reason":null}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","usage":{"prompt_tokens":24,"prompt_tokens_details": {"cached_tokens": 0},"completion_tokens":5,"total_tokens":29,"batch_size":[1,1,1,1,1],"queue_wait_time":[5318,117,82,72,196]},"choices":[{"index":0,"delta":{"role":"assistant","content":" assistant"},"logprobs":null,"finish_reason":"length"}]}

data: {"id":"endpoint_common_10","object":"chat.completion.chunk","created":1744038509,"model":"llama","usage":{"prompt_tokens":24,"prompt_tokens_details": {"cached_tokens": 0},"completion_tokens":5,"total_tokens":29,"batch_size":[1,1,1,1,1],"queue_wait_time":[5318,117,82,72,196]},"choices":[{"index":1,"delta":{"role":"assistant","content":" assistant"},"logprobs":null,"finish_reason":"length"}]}

data: [DONE]

[object Object]

2. v1/completions流式推理接口

[object Object]

data: [DONE]

[object Object][object Object]

本章节以文本推理接口和流式推理接口为例介绍接口调用,其他接口的调用方法请参见章节。

1. 文本推理接口

[object Object]

2. 流式推理接口

[object Object][object Object]

[object Object]

data: {"prefill_time":null,"decode_time":128.32,"token":{"id":[263],"text":" a"}}

data: {"prefill_time":null,"decode_time":18.17,"token":{"id":[5176],"text":" French"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[17739],"text":" photograph"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[261],"text":"er"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[2729],"text":" based"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[297],"text":" in"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[3681],"text":" Paris"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[29889],"text":"."}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[13],"text":"\n"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[29902],"text":"I"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[505],"text":" have"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[1063],"text":" been"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[27904],"text":" shooting"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[1951],"text":" since"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[306],"text":" I"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[471],"text":" was"}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[29871],"text":" "}}

data: {"prefill_time":null,"decode_time":16.80,"token":{"id":[29896],"text":"1"}}

data: {"prefill_time":null,"decode_time":16.80,"generated_text":"am a French photographer based in Paris.\nI have been shooting since I was 15","details":{"finish_reason":"length","generated_tokens":20,"seed":846930886},"token":{"id":[29945],"text":null}}

[object Object][object Object]

目前MindIE支持AISBench工具进行精度测试,示例如下所示,其详细使用方法请参见

操作步骤

  1. 使用以下命令下载并安装AISBench工具。

    [object Object]
    [object Object]
  2. 准备数据集。

    以gsm8k为例,单击下载数据集,将解压后的gsm8k文件夹放置于工具根路径的ais_bench/datasets文件夹下。

  3. 配置ais_bench/benchmark/configs/models/vllm_api/vllm_api_stream_chat.py文件,示例如下所示。

    [object Object]
  4. 执行以下命令启动服务化精度测试。

    [object Object]

    回显如下所示则表示执行成功:

    [object Object]
[object Object]

目前MindIE支持AISBench工具进行性能测试,示例如下所示,其详细使用方法请参见

操作步骤

  1. 使用以下命令下载并安装AISBench工具。

    [object Object]
    [object Object]
  2. 准备数据集。

    以gsm8k为例,单击下载数据集,将解压后的gsm8k/文件夹放置于工具根路径的ais_bench/datasets文件夹下。

  3. 配置ais_bench/benchmark/configs/models/vllm_api/vllm_api_stream_chat.py文件,示例如下所示。

    [object Object]
  4. 执行以下命令启动服务化性能测试。

    [object Object]

    回显如下所示则表示执行成功:

    [object Object]

    性能测试结果主要关注TTFT、TPOT、Request Throughput和Output Token Throughput输出参数,参数详情信息请参见

    [object Object]
[object Object]

使用安装用户登录安装节点,两种停止Server服务方式如下所示。

  • 方式一(推荐):若使用后台进程方式启动服务,两种停止服务方式如下所示:

    • 使用kill命令停止进程。

      [object Object]
      [object Object]
    • 或使用pkill命令停止进程。

      [object Object]
  • **方式二:**若直接启动进程方式启动服务,可以通过按ctrl+c停止服务。