Specifications
This section describes the operator information defined based on the Ascend IR. Before using an operator, read the following description.
Describes the formats listed in the operator specifications. |
|
Describes the tensor types listed in the operator specifications. |
|
If the input tensor data types of some operators (such as Add and Mul) are different, the data type is automatically promoted during operator computation. This part describes the rules for data type promotion. |
|
Lists the operators involved in and supported by deterministic computing. |
Format
- ND: any format, applicable to operators that take singular inputs, such as Square and Tanh.
- NC1HWC0: self-developed 5D data format. C0 is closely related to the micro-architecture, and the value is equal to the Cube Unit size, for example, 16. C1 is obtained by dividing the C dimension by C0, that is, C1 = C/C0. When the division is not exact, the last data segment is padded to C0.
- FRACTAL_Z: a format of the convolution weight.
TensorType
Currently, only the StridedSlice, StridedSliceGrad, and AsStrided operators support the DT_COMPLEX32 data type in OrdinaryType, BasicType, NumberType, ComplexDataType, and UnaryDataType.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | struct TensorType { explicit TensorType(DataType dt); TensorType(const std::initializer_list<DataType> &initial_types); static TensorType ALL() { return TensorType{DT_BOOL, DT_COMPLEX128, DT_COMPLEX64, DT_DOUBLE, DT_FLOAT, DT_FLOAT16, DT_INT16, DT_INT32, DT_INT64, DT_INT8, DT_QINT16, DT_QINT32, DT_QINT8, DT_QUINT16, DT_QUINT8, DT_RESOURCE, DT_STRING, DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8, DT_BF16, DT_COMPLEX32}; } static TensorType QuantifiedType() { return TensorType{DT_QINT16, DT_QINT32, DT_QINT8, DT_QUINT16, DT_QUINT8}; } static TensorType OrdinaryType() { return TensorType{DT_BOOL, DT_COMPLEX128, DT_COMPLEX64, DT_DOUBLE, DT_FLOAT, DT_FLOAT16, DT_INT16, DT_INT32, DT_INT64, DT_INT8, DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8, DT_BF16, DT_COMPLEX32}; } static TensorType BasicType() { return TensorType{DT_COMPLEX128, DT_COMPLEX64, DT_DOUBLE, DT_FLOAT, DT_FLOAT16, DT_INT16, DT_INT32, DT_INT64, DT_INT8, DT_QINT16, DT_QINT32, DT_QINT8, DT_QUINT16, DT_QUINT8, DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8, DT_BF16, DT_COMPLEX32}; } static TensorType NumberType() { return TensorType{DT_COMPLEX128, DT_COMPLEX64, DT_DOUBLE, DT_FLOAT, DT_FLOAT16, DT_INT16, DT_INT32, DT_INT64, DT_INT8, DT_QINT32, DT_QINT8, DT_QUINT8, DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8, DT_BF16, DT_COMPLEX32}; } static TensorType RealNumberType() { return TensorType{DT_DOUBLE, DT_FLOAT, DT_FLOAT16, DT_INT16, DT_INT32, DT_INT64, DT_INT8, DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8, DT_BF16}; } static TensorType ComplexDataType() { return TensorType{DT_COMPLEX128, DT_COMPLEX64, DT_COMPLEX32}; } static TensorType IntegerDataType() { return TensorType{DT_INT16, DT_INT32, DT_INT64, DT_INT8, DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8}; } static TensorType SignedDataType() { return TensorType{DT_INT16, DT_INT32, DT_INT64, DT_INT8}; } static TensorType UnsignedDataType() { return TensorType{DT_UINT16, DT_UINT32, DT_UINT64, DT_UINT8}; } static TensorType FloatingDataType() { return TensorType{DT_DOUBLE, DT_FLOAT, DT_FLOAT16}; } static TensorType IndexNumberType() { return TensorType{DT_INT32, DT_INT64}; } static TensorType UnaryDataType() { return TensorType{DT_COMPLEX128, DT_COMPLEX64, DT_DOUBLE, DT_FLOAT, DT_FLOAT16, DT_BF16, DT_COMPLEX32}; } static TensorType FLOAT() { return TensorType{DT_FLOAT, DT_FLOAT16, DT_BF16}; } std::shared_ptr<TensorTypeImpl> tensor_type_impl_; }; |
Type Promotion
If the input tensor data types of some operators (such as Add and Mul) are different, the data type is automatically promoted during operator computation. The following table lists the rules for data type promotion.
Data Type |
f32 |
f16 |
bf16 |
s8 |
u8 |
s16 |
u16 |
s32 |
u32 |
s64 |
u64 |
bool |
c32 |
c64 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
× |
f32 |
× |
f32 |
× |
f32 |
c64 |
c64 |
f16 |
f32 |
f16 |
f32 |
f16 |
f16 |
f16 |
× |
f16 |
× |
f16 |
× |
f16 |
c32 |
c64 |
bf16 |
f32 |
f32 |
bf16 |
bf16 |
bf16 |
bf16 |
× |
bf16 |
× |
bf16 |
× |
bf16 |
c32 |
c64 |
s8 |
f32 |
f16 |
bf16 |
s8 |
s16 |
s16 |
× |
s32 |
× |
s64 |
× |
s8 |
c32 |
c64 |
u8 |
f32 |
f16 |
bf16 |
s16 |
u8 |
s16 |
× |
s32 |
× |
s64 |
× |
u8 |
c32 |
c64 |
s16 |
f32 |
f16 |
bf16 |
s16 |
s16 |
s16 |
× |
s32 |
× |
s64 |
× |
s16 |
c32 |
c64 |
u16 |
× |
× |
× |
× |
× |
× |
u16 |
× |
× |
× |
× |
× |
× |
× |
s32 |
f32 |
f16 |
bf16 |
s32 |
s32 |
s32 |
× |
s32 |
× |
s64 |
× |
s32 |
c32 |
c64 |
u32 |
× |
× |
× |
× |
× |
× |
× |
× |
u32 |
× |
× |
× |
× |
× |
s64 |
f32 |
f16 |
bf16 |
s64 |
s64 |
s64 |
× |
s64 |
× |
s64 |
× |
s64 |
c32 |
c64 |
u64 |
× |
× |
× |
× |
× |
× |
× |
× |
× |
× |
u64 |
× |
× |
× |
bool |
f32 |
f16 |
bf16 |
s8 |
u8 |
s16 |
× |
s32 |
× |
s64 |
× |
bool |
c32 |
c64 |
c32 |
c64 |
c32 |
c32 |
c32 |
c32 |
c32 |
× |
c32 |
× |
c32 |
× |
c32 |
c32 |
c64 |
c64 |
c64 |
c64 |
c64 |
c64 |
c64 |
c64 |
× |
c64 |
× |
c64 |
× |
c64 |
c64 |
c64 |
- For ease of description, the data types used in the table are abbreviated: DT_FLOAT (f32), DT_FLOAT16 (f16), DT_BF16 (bf16), DT_INT8 (s8), DT_UINT8 (u8), DT_INT16 (s16), DT_UINT16 (u16), DT_INT32 (s32), DT_UINT32 (u32), DT_INT64 (s64), DT_UINT64 (u64), DT_BOOL (bool), DT_COMPLEX32 (c32), and DT_COMPLEX64 (c64).
- Currently, the AI Core engine does not support the DT_DOUBLE and DT_COMPLEX128 types for operator precision promotion. For example, if the data types of the input parameters of the Mul operator are float32 and double, the float32 data type cannot be promoted to double for the AI Core engine, and the input will be allocated to the AI CPU engine.
- The table heading and the leftmost column in the table indicate the two input data types to be deduced. The corresponding intersections in the table indicate the deduced data types.
- × indicates that the two data types cannot be deduced.
Deterministic Computing
Asynchronous multi-thread executions during operator implementation change the accumulation sequence of floating point numbers. The results of multiple executions of an operator with the same hardware and input may be different. When deterministic computing is enabled, multiple executions of an operator with the same hardware and input generate the same output.
If an operator listed in the following table is not in the corresponding specification list, the current processor version does not support the operator.
- The following operators involve but do not support deterministic computing:
- resizegradD
- WeightQuantBatchMatmulV2
- The following operators involve and support deterministic computing:
- AvgPool3DGrad
- BatchMatMul
- BatchMatMulV2
- BiasAddGrad
- BinaryCrossEntropy
- BN3DTrainingReduce
- BN3DTrainingUpdateGrad
- BNTrainingReduce
- BNTrainingUpdateGrad
- Conv2DBackpropFilter: This operator is supported only by
Atlas training products ,Atlas A3 training products /Atlas A3 inference products , andAtlas A2 training products /Atlas A2 inference products . - Conv3DBackpropFilter: This operator is supported only by
Atlas training products ,Atlas A3 training products /Atlas A3 inference products , andAtlas A2 training products /Atlas A2 inference products . - EmbeddingDenseGrad
- FullyConnection
- GroupNormGrad
- Histogram
- InplaceIndexAdd
- KLDiv
- LpNormReduceV2
- LpNormV2
- MseLoss
- MatMul
- MatMulV2
- NLLLoss
- ReduceMean
- ReduceSum
- ScatterAdd
- ScatterElements
- ScatterNd
- ScatterNdAdd
- UnsortedSegmentSum