MindIE Overview
Mind Inference Engine (MindIE) is an inference acceleration suite provided by Huawei Ascend for various AI scenarios. Through layered open AI capabilities, it supports diversified AI service requirements and empowers a large number of models by leveraging the computing power of Ascend hardware devices. MindIE is compatible with multiple mainstream AI frameworks and connects to different types of Ascend AI Processors. With multi-layer programming interfaces, it helps users quickly build inference services based on the Ascend platform.
Overall Architecture
MindIE provides inference solutions tailored to multiple AI scenarios and helps users quickly migrate and customize services. Figure 1 shows the MindIE architecture, and Table 1 describes the main components.
Component |
Description |
|---|---|
MindIE Motor |
MindIE Motor describes a request scheduling framework oriented to LLM prefill-decode disaggregation inference, which provides inference service capabilities through an open and scalable inference service platform architecture and connects to MindIE LLM to meet the high-performance inference requirements of LLMs. |
MindIE SD |
MindIE SD aims to build an Ascend-affinity multimodal acceleration suite and work with industry model suites (such as diffusers) to improve the efficiency of multimodal inference on Ascend. |
MindIE LLM |
MindIE LLM is an inference component designed for large language models (LLMs). It provides common LLM inference capabilities based on the Ascend hardware, schedules multiple concurrent requests, and supports acceleration features such as Continuous Batching, PageAttention, and FlashDecoding, enabling high-performance inference. |
MindIE Turbo |
MindIE Turbo is a universal Ascend hardware acceleration suite designed for inference engines, optimizing memory, communication, encoding, and decoding to enhance throughput and reduce latency. Currently, vLLM acceleration is supported. |
Key Features
- Serving deployment
Provides capabilities such as request scheduling, reliability, availability, and serviceability for the prefill-decode disaggregation inference service. For details, see MindIE Motor Development Guide.
- Multimodal generation
Supports inference job migration of multimodal models to efficiently deploy Stable Diffusion (SD) applications and implement scenario-specific SD applications, meeting customers' accuracy and performance requirements. For details, see MindIE SD Development Guide.
- LLM inference
Provides the LLM inference capability and supports the handling of an E2E service process and capability openness level by level, to cater for diversified customer requirements. For details, see MindIE LLM Development Guide.
- Inference engine acceleration plugin library
Acceleration plugin library for LLM inference engines, developed based on Ascend hardware, contains Huawei-developed LLM optimization algorithms and optimization related to inference engine frameworks. It offers a series of modular and plugin interfaces to seamlessly integrate and accelerate third-party inference engines. For details, see MindIE Turbo Development Guide.
