Shape and Memory Size Calculation of Matrices A, B, and C

If the transpose flag of matrices A, B, and C is set to ACL_TRANS_N or ACL_TRANS_T, the size of the memory allocated to store the matrix data must match the actual data size. The formulas for calculating the shape and memory size are as follows:
- Matrix A: shape = (m, k); Memory size = m * k * sizeof(dataTypeA)
- Matrix B: shape = (k, n); Memory size = k * n * sizeof(dataTypeB)
- Matrix C: shape = (m, n); Memory size = m * n * sizeof(dataTypeC)
If the transpose flag of matrices A, B, and C is set to ACL_TRANS_NZ, the internal data format is used. The matrices are 4D. The formulas for calculating the shape and memory size are as follows (assuming that m, k, and n are the original axes):
- When matrix A and matrix B are of type aclFloat16, m, k, and n should be rounded up to the nearest multiples of 16, respectively.
  - Matrix A: Shape = (⌈k/16⌉, ⌈m/16⌉, 16, 16); Memory size = ⌈m/16⌉ * 16 * ⌈k/16⌉ * 16 * sizeof(dataTypeA)
  - Matrix B: Shape = (⌈n/16⌉, ⌈k/16⌉, 16, 16); Memory size = ⌈k/16⌉ * 16 * ⌈n/16⌉ * 16 * sizeof(dataTypeB)
  - Matrix C: Shape = (⌈n/16⌉, ⌈m/16⌉, 16, 16); Memory size = ⌈m/16⌉ * 16 * ⌈n/16⌉ * 16 * sizeof(dataTypeC)
- When matrix A and matrix B are of type int8_t, the axes to reduce should be rounded up to the nearest multiples of 32 and the axes not to reduce should be rounded up to the nearest multiples of 16, respectively.
  - Matrix A: Shape = (⌈k/32⌉, ⌈m/16⌉, 16, 32); Memory size = ⌈m/16⌉ * 16 * ⌈k/32⌉ * 32 * sizeof(dataTypeA)
  - Matrix B: Shape = (⌈k/32⌉, ⌈n/16⌉, 32, 16); Memory size = ⌈k/32⌉ * 32 * ⌈n/16⌉ * 16 * sizeof(dataTypeB)
  - Matrix C: Shape = (⌈n/16⌉, ⌈m/16⌉, 16, 16); Memory size = ⌈m/16⌉ * 16 * ⌈n/16⌉ * 16 * sizeof(dataTypeC)
⌈ ⌉ indicates rounding-up.

Parent topic: CBLAS