Thinking Analysis
Some LLMs include the thinking process in their outputs. This feature is designed to structurally parse the output, separating the model's thinking process (think) from the final output (content) and storing them in the reasoning_content and content fields, respectively.
- reasoning_content: stores the model's internal reasoning, analysis, and logic judgment before generating the final answer.
- content: stores the model's final output answer or decision.
Constraints
- This feature is supported by the Atlas 800I A2 inference server, Atlas 800I A3 SuperPoD Server, and Atlas 300I Duo inference card.
- Currently, only the Qwen3-32B, Qwen3-235B-A22B, Qwen3-30B-A3B, DeepSeek-R1, and DeepSeek-V3.1 models support this feature.
- To enable the thinking analysis feature for DeepSeek-V3.1, add "chat_template_kwargs": {"enable_thinking": <bool>} to the request or add "enable_thinking": <bool> to the tokenizer_config.json file.
- Currently, only the OpenAI inference API is supported.
Parameters
Table 1 lists the parameters required for enabling the thinking analysis feature.
Parameter |
Value Type |
Value Range |
Description |
|---|---|---|---|
enable_reasoning |
Bool |
|
Specifies whether to enable model thinking analysis, separating the output into two fields: reasoning_content and content.
Mandatory. The default value is false. |
Running Inference
- Open the config.json file of the Server.
cd {MindIE installation directory}/latest/mindie-service/ vi conf/config.json - Set serving parameters. Add the enable_reasoning field to the config.json file of the Server by referring to Table 1. For details about the serving parameters, see Configuration Parameters (Service-Specific). The following is a parameter configuration example.
The following uses Qwen3-32B as an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
"ModelDeployConfig" : { "maxSeqLen" : 2560, "maxInputTokenLen" : 2048, "truncation" : false, "ModelConfig" : [ { "modelInstanceType" : "Standard", "modelName" : "Qwen3-32B", "modelWeightPath" : "/data/weight/Qwen3-32B", "worldSize" : 1, "cpuMemSize" : 0, "npuMemSize" : -1, "backendType" : "atb", "trustRemoteCode" : false, "async_scheduler_wait_time": 120, "kv_trans_timeout": 10, "kv_link_timeout": 1080, "models": { "qwen3": {"enable_reasoning": true} } } ] },
- Qwen3-30B-A3B: Change qwen3 to qwen3_moe.
- DeepSeek-R1: Change qwen3 to deepseekv2 and change model_type in the DeepSeek-R1 weight file to deepseek_v3.
- Start the service. For details, see "Quick Start" > "Service Startup" in MindIE Motor Development Guide.
- Send a request. For details about the parameters, see Inference API.
Parent topic: Interaction Features