Overview

MindStudio Kernel Performance Prediction (msKPP) is a performance modeling tool. Before operator development, it takes the mathematical logic of operators as the input to write operator expressions for an operator implementation solution using the DSL language, and obtains the operator performance modeling result for that solution. Since performance prediction requires only the execution time of corresponding algorithms based on input and output sizes, and not actual computation, performance modeling results can be provided within seconds.

Technical Overview

To achieve the theoretical performance, msKPP models the performance of computation and transfer instructions for actual processors based on Table 1.

Table 1 Hypothetical performance for msKPP modeling

Performance Assumption

Description

The internal memory (local memory) is unlimited. However, users can control the memory within the lifetime.

This assumption means that the memory capacity limitations are not taken into account during the modeling process of actual processors. This allows users and developers to allocate and use memory resources without worrying about insufficient memory. In practice, despite the limitations of physical memory, this assumption has the benefit of simplifying the model, enabling users and developers to focus on other performance-related factors.

The instruction capability evaluated by statistics represents the theoretical performance.

This assumption posits that theoretical processor performance can be inferred through statistical analysis of executed instructions, and that the average performance achieved during instruction execution is indicative of the processors' maximum performance potential. This assumption helps improve processor performance through statistical model prediction during design and optimization.

There is no bottleneck in delivery.

This assumption implies that there will be no bottlenecks or limitations encountered during the process of delivering data or instructions to the chip execution units. That is, data transfer and instruction scheduling can be performed seamlessly without any performance degradation due to any hardware or software limitations.