性能测试

目前MindIE支持AISBench和MindIE自带的MindIE Benchmark等工具进行性能测试，示例如下所示。

MindIE Benchmark工具将于2026年日落，请优先使用AISBench工具，其详细使用方法请参见AISBench工具。

AISBench

使用以下命令下载并安装AISBench工具。
1 2 3 4 5
git clone https://gitee.com/aisbench/benchmark.git cd benchmark/ pip3 install -e ./ --use-pep517 pip3 install -r requirements/api.txt pip3 install -r requirements/extra.txt
pip安装方式适用于使用AISBench最新功能的场景（镜像安装MindIE方式除外）。AISBench工具已预装在MindIE镜像中，可使用以下命令查看AISBench工具在MindIE镜像中的安装路径。
1
pip show ais_bench_benchmark
准备数据集。
以gsm8k为例，单击gsm8k数据集下载数据集，将解压后的gsm8k/文件夹放置于工具根路径的ais_bench/datasets文件夹下。

配置ais_bench/benchmark/configs/models/vllm_api/vllm_api_stream_chat.py文件，示例如下所示。

from ais_bench.benchmark.models import VLLMCustomAPIChatStream  
models = [
    dict(
        attr="service",
        type=VLLMCustomAPIChatStream,
        abbr='vllm-api-stream-chat',
        path="",                    # 指定模型序列化词表文件绝对路径，即模型权重文件夹路径        
        model="DeepSeek-R1",        # 指定服务端已加载模型名称，依据实际VLLM推理服务拉取的模型名称配置（配置成空字符串会自动获取）        
        request_rate = 0,           # 请求发送频率，每1/request_rate秒发送1个请求给服务端，小于0.1则一次性发送所有请求        
        retry = 2,
        host_ip = "localhost",      # 指定推理服务的IP        
        host_port = 8080,           # 指定推理服务的端口        
        max_out_len = 512,          # 推理服务输出的token的最大数量        
        batch_size=1,               # 请求发送的最大并发数        
        generation_kwargs = dict(
            temperature = 0.5,
            top_k = 10,
            top_p = 0.95,
            seed = None,
            repetition_penalty = 1.03,
            ignore_eos = True,      # 推理服务输出忽略eos（输出长度会达到max_out_len）        
        )     
    ) 
]

执行以下命令启动服务化性能测试。

ais_bench --models vllm_api_stream_chat --datasets demo_gsm8k_gen_4_shot_cot_chat_prompt --mode perf --debug

回显如下所示则表示执行成功：

╒════════════╤════╤════════╤═══════╤══════╤═══════╤══════╤═══════╤═══════╤═══╕
│ Performance Parameters │ Stage  │ Average        │ Min          │ Max        │ Median       │ P75        │ P90          │ P99          │ N    │ 
│ E2EL                   │total   │ 2048.2945  ms  │ 1729.7498 ms │ 3450.96 ms │ 2491.8789 ms │ 2750.85 ms │ 3184.9186 ms │ 3424.4354 ms │ 8    │
│ TTFT                   │total   │ 50.332 ms      │ 50.6244 ms   │ 52.0585 ms │ 50.3237 ms   │ 50.5872 ms │ 50.7566 ms   │ 50.0551 ms  │ 8    │
│ TPOT                   │total   │ 10.6965 ms     │ 10.061 ms    │ 10.8805 ms │ 10.7495 ms   │ 10.7818 ms │ 10.808 ms    │ 10.8582 ms   │ 8    │ 
│ ITL                    │total   │ 10.6965 ms     │ 7.3583 ms    │ 13.7707 ms │ 10.7513 ms   │ 10.8009 ms │ 10.8358 ms   │ 10.9322 ms   │ 8    │ 
│ InputTokens            │total   │ 1512.5         │ 1481.0       │ 1566.0     │ 1511.5       │ 1520.25    │ 1536.6       │ 1563.06      │ 8    │ 
│ OutputTokens           │total   │ 287.375        │ 200.0        │ 407.0      │ 280.0        │ 322.75     │ 374.8        │ 403.78       │ 8    │ 
│ OutputTokenThroughput  │total   │ 115.9216       │ 107.6555     │ 116.5352   │ 117.6448     │ 118.2426   │ 118.3765     │ 118.6388     │ 8    │
╘════════════╧════╧════════╧═══════╧══════╧═══════╧══════╧═══════╧═══════╧═══╛
╒═════════════╤═════╤══════════╕
│ Common Metric            │ Stage    │ Value              │ 
│ Benchmark Duration       │ total    │ 19897.8505 ms      │ 
│ Total Requests           │ total    │ 8                  │ 
│ Failed Requests          │ total    │ 0                  │ 
│ Success Requests         │ total    │ 8                  │ 
│ Concurrency              │ total    │ 0.9972             │ 
│ Max Concurrency          │ total    │ 1                  │ 
│ Request Throughput       │ total    │ 0.4021 req/s       │ 
│ Total Input Tokens       │ total    │ 12100              │ 
│ Prefill Token Throughput │ total    │ 17014.3123 token/s │ 
│ Total generated tokens   │ total    │ 2299               │ 
│ Input Token Throughput   │ total    │ 608.7438 token/s   │ 
│ Output Token Throughput  │ total    │ 115.7835 token/s   │ 
│ Total Token Throughput   │ total    │ 723.5273 token/s   │ 
╘═════════════╧═════╧══════════╛

性能测试结果主要关注TTFT、TPOT、Request Throughput和Output Token Throughput输出参数，参数详情信息请参见表2。

任务执行的过程会落盘在默认的输出路径，该输出路径在运行中的打印日志中有提示，日志内容如下所示：

08/28 15:13:26 - AISBench - INFO - Current exp folder: outputs/default/20250828_151326

命令执行结束后，outputs/default/20250828_151326中的任务执行的详情如下所示：

20250828_151326           # 每次实验基于时间戳生成的唯一目录 
├── configs               # 自动存储的所有已转储配置文件 
├── logs                  # 执行过程中日志，命令中如果加--debug，不会有过程日志落盘（都直接打印出来了） 
│   └── performance/      # 推理阶段的日志文件 
└── performance           # 性能测评结果 
│    └── vllm-api-stream-chat/          # “服务化模型配置”名称，对应模型任务配置文件中models的 abbr参数 
│         ├── gsm8kdataset.csv          # 单次请求性能输出（CSV），与性能结果打印中的Performance Parameters表格一致 
│         ├── gsm8kdataset.json         # 端到端性能输出（JSON），与性能结果打印中的Common Metric表格一致 
│         ├── gsm8kdataset_details.json # 全量打点日志（JSON） 
│         └── gsm8kdataset_plot.html    # 请求并发可视化报告（HTML）

MindIE Benchmark

性能样例如下所示，参数详细解释请参见输入参数。

benchmark \
--DatasetPath "/{数据集路径}/GSM8K" \
--DatasetType "gsm8k" \
--ModelName "llama3-70b" \
--ModelPath "/{模型路径}/llama3-70b" \
--TestType client \
--Http https://{ipAddress}:{port} \
--ManagementHttp https://{managementIpAddress}:{managementPort} \
--Concurrency 1000 \
--MaxOutputLen 512 \

性能测试结果主要关注FirstTokenTime、DecodeTime等token生成时延的指标和lpct（latency per complete token，Prefill阶段平均每个token时延）、Throughput等测试吞吐量的指标。

父主题： 快速入门