Setting ShapeInfo Dimensions to 0 by Using K_MAX_SHAPE_DIM

[Priority] Medium

[Description] GlobalTensor and LocalTensor use member variables of the ShapeInfo type to store shape information, which can be set or obtained by using SetShapeInfo and GetShapeInfo. Generally, the member variables are used to store and transfer shape information inside operators. The default number of shape dimensions is 8. If you do not need this information, set the ShapeInfo dimensions to 0 using the K_MAX_SHAPE_DIM macro. According to the test result, a smaller K_MAX_SHAPE_DIM value can decrease the stack space, reduce the scalar instruction, and cache miss probability, improving the operator performance.

...
#ifndef K_MAX_SHAPE_DIM
#define K_MAX_SHAPE_DIM 8
#endif
...
struct ShapeInfo {
public:
    ...
    uint32_t shape[K_MAX_SHAPE_DIM];
    uint32_t originalShape[K_MAX_SHAPE_DIM];
};

template <typename T> class GlobalTensor {
....
private:
    ShapeInfo shapeInfo_;
}
template <typename T> class LocalTensor {
....
private:
    ShapeInfo shapeInfo_;
}
...

[Negative Example]

The operator does not need to use ShapeInfo, but the ShapeInfo size is not limited (defaulted to 8). As a result, the stack space of K_MAX_SHAPE_DIM * sizeof(uint32_t) * 2 * 4 bytes is wasted. The value 2 indicates that there are two arrays: shape and originalShape. The value 4 indicates that four tensors (GlobalTensor and LocalTensor) are used in this sample.
...
#include "kernel_operator.h" ...
extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR x, GM_ADDR z, GM_ADDR workspace, GM_ADDR tiling)
{
    ...
    GlocalTensor<T> dataIn;
    GlocalTensor<T> dataOut;
    LocalTensor<T> vecIn;
    LocalTensor<T> vecOut;
    ...
}
...

[Positive Example]

The operator does not need to use ShapeInfo. Set #define K_MAX_SHAPE_DIM 0 to effectively reduce the stack space of K_MAX_SHAPE_DIM * sizeof(uint32_t) * 2 * 4.
#define K_MAX_SHAPE_DIM 0
...
#include "kernel_operator.h" // Note that the K_MAX_SHAPE_DIM macro must be defined before the header files related to Ascend C are included.
...
extern "C" __global__ __aicore__ void add_custom(GM_ADDR x, GM_ADDR x, GM_ADDR z, GM_ADDR workspace, GM_ADDR tiling)
{
    ...
    GlocalTensor<T> dataIn;
    GlocalTensor<T> dataOut;
    LocalTensor<T> vecIn;
    LocalTensor<T> vecOut;
    ...
}
...