Modeling Operator Features
msKPP supports tensor splitting, debug mode, comparison between theoretical values of pipeline information and values measured by msprof, and modeling of operator features (transfer channel, channel conversion, and cache hit ratio). You can select a function as required to implement Analyzing Operator Computing and Transfer Specifications, Analyzing Extreme Performance, and Preliminary Design of Operator Tiling.
Replace Ascendxxxyy in this document with the actual processor type.
- Channel conversion modeling
In the Cube unit of the Ascend AI Processor, the data format for calculation must be a special private NZ format. Generally, data in the GM is in ND format. Therefore, the data format needs to be converted during Cube calculation. In the , the transfer channel from the GM to the Cube-related storage unit has the ND-to-NZ format conversion capability.
In msKPP, if the user-defined GM tensor is in ND format and the L1 tensor is in NZ format for GM-L1, or if the user-defined L0C tensor is in NZ format and the GM tensor is in ND format for L0C-GM, then enable channel-based format conversion and retrieve relevant empirical data.
1 2 3 4 5
in_x = Tensor("GM", "FP16", [128, 256], format="ND") l1_x1 = Tensor("L1", format="NZ") l1_x2 = Tensor("L1", format="NZ") l1_x1.load(in_x[128, 0:128]) l1_x2.load(in_x[128, 128:])
- Cache hit ratio
L2 cache refers to the high-bandwidth transfer channel between part of the GM space and vector core and cube core. When the L2 cache hit ratio is close to 100% compared to when it is near 0%, there can be more than a twofold difference in bandwidth. Currently, msKPP allows users to manually adjust the L2 cache hit ratio.
1 2 3
with Chip("Ascendxxxyy") as chip: config = {"cache_hit_ratio": 0.6} chip.set_cache_hit_ratio(config)
- Tensor splitting
- Debug mode
- Comparison between theoretical values of pipeline information and values measured by msprof
Take the Ascend C operator as an example. Call msprof in --application mode to generate the PipeUtilization.csv file in the OPPROF_{timestamp}_XXX directory, and enable the comparison in the script.
1 2 3
with Chip("Ascendxxxyy") as chip: chip.enable_metrics() chip.set_prof_summary_path("/home/xx/OPPROF_{timestamp}_XXX/PipeUtilization.csv")
The generated Pipe_statistic.csv file contains two columns: ProfDuration(us)_0 and ProfRatio_0. The values in the ProfDuration(us)_0 column are the same as those in the PipeUtilization.csv file. ProfRatio_0 indicates the ratio of the measured value to the theoretical value. ProfRatio is a multiple of the measured value relative to the theoretical value. A larger multiple indicates a larger optimization space.Figure 1 Pipe_statistic.csv file