Thinking Analysis

Some LLMs include the thinking process in their outputs. This feature is designed to structurally parse the output, separating the model's thinking process (think) from the final output (content) and storing them in the reasoning_content and content fields, respectively.

reasoning_content: stores the model's internal reasoning, analysis, and logic judgment before generating the final answer.
content: stores the model's final output answer or decision.

Constraints

This feature is supported by the Atlas 800I A2 inference server, Atlas 800I A3 SuperPoD Server, and Atlas 300I Duo inference card.
Currently, only the Qwen3-32B, Qwen3-235B-A22B, Qwen3-30B-A3B, DeepSeek-R1, and DeepSeek-V3.1 models support this feature.
To enable the thinking analysis feature for DeepSeek-V3.1, add "chat_template_kwargs": {"enable_thinking": <bool>} to the request or add "enable_thinking": <bool> to the tokenizer_config.json file.
Currently, only the OpenAI inference API is supported.

Parameters

Table 1 lists the parameters required for enabling the thinking analysis feature.

**Table 1** Supplementary parameters of the thinking analysis feature: models in ModelConfig
Parameter	Value Type	Value Range	Description
enable_reasoning	Bool	true false	Specifies whether to enable model thinking analysis, separating the output into two fields: reasoning_content and content. false: disable true: enable Mandatory. The default value is false.

Running Inference

Open the config.json file of the Server.

cd {MindIE installation directory}/latest/mindie-service/
vi conf/config.json

Set serving parameters. Add the enable_reasoning field to the config.json file of the Server by referring to Table 1. For details about the serving parameters, see Configuration Parameters (Service-Specific). The following is a parameter configuration example.

The following uses Qwen3-32B as an example:

 "ModelDeployConfig" :
        {
            "maxSeqLen" : 2560,
            "maxInputTokenLen" : 2048,
            "truncation" : false,
            "ModelConfig" : [
                {
                    "modelInstanceType" : "Standard",
                    "modelName" : "Qwen3-32B",
                    "modelWeightPath" : "/data/weight/Qwen3-32B",
                    "worldSize" : 1,
                    "cpuMemSize" : 0,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false,
                    "async_scheduler_wait_time": 120,
                    "kv_trans_timeout": 10,
                    "kv_link_timeout": 1080,
                    "models": {
                            "qwen3": {"enable_reasoning": true}
                    }
                }
            ]
        },

Qwen3-30B-A3B: Change qwen3 to qwen3_moe.
DeepSeek-R1: Change qwen3 to deepseekv2 and change model_type in the DeepSeek-R1 weight file to deepseek_v3.

Start the service. For details, see "Quick Start" > "Service Startup" in MindIE Motor Development Guide.
Send a request. For details about the parameters, see Inference API.

Parent topic: Interaction Features