Optimization Suggestion Overview
Category |
Description |
Optimization Suggestions |
|---|---|---|
Provides tiling-related optimization suggestions for you to select a proper tiling strategy. |
||
Provides optimization suggestions for reducing the header and tailer overheads (latency generated before and after the operator performs compute). |
||
Preventing TPipe from Being Created and Initialized Inside the Object |
||
Improves hardware resource utilization and achieves higher throughput by means of task parallelization and asynchronous scheduling. |
||
Enabling Asynchronous the Iterate or IterateAll API to Avoid AIC/AIV Synchronization Dependency |
||
Maximizes the transfer efficiency by controlling the size of the data block to be transferred and the GM address. Reduces the memory usage and improves the computing efficiency by sharing and reusing buffers, compressing and simplifying data, using dedicated storage space, and optimizing memory access scheduling. |
||
Using Shared Temporary Buffer for Operators and High-Level APIs |
||
Reducing the Tensor ShapeInfo Dimensions to Optimize the Stack Space |
||
Provides optimization suggestions related to vector compute. |
||
Selecting Low-Latency Instructions to Optimize Reduction Operation Performance |
||
Provides optimization suggestions related to Cube compute. |
||
Efficient Quantization by Storing Quantization Parameters in the FP Buffer |
||
Efficient Matrix Multiplication Accumulation by Using L0C Buffer |
||
Smaller Matrices Residing on L1 Buffer, Only Larger Matrices Transferred in Batches |
||