fp16_t

The APIs described in this section are reserved and may be changed or deprecated in the future. They do not need your attention.

Table 1 API list

API Definition

Description

tagFp16(void)

Indicates the default constructor of fp16_t, which does not contain any parameters.

tagFp16(const T &value)

Indicates the constructor of fp16_t, which has a parameter of any data type.

tagFp16(const bfloat16& value)

Indicates the constructor of fp16_t, which has a parameter of the bfloat16 type.

tagFp16(const uint16_t &uiVal)

Indicates the constructor of fp16_t, which has a parameter of the uint16_t type.

tagFp16(const tagFp16 &fp)

Indicates the constructor of fp16_t, which has a parameter of the fp16_t type (copy constructor).

float()

Overrides the cast operator to convert fp16_t to float (fp32).

bfloat16()

Overrides the cast operator to convert fp16_t to bfloat16.

double()

Overrides the cast operator to convert fp16_t to double (fp64).

int8_t()

Overrides the cast operator to convert fp16_t to int8_t.

uint8_t()

Overrides the cast operator to convert fp16_t to uint8_t.

int16_t()

Overrides the cast operator to convert fp16_t to int16_t.

uint16_t()

Overrides the cast operator to convert fp16_t to uint16_t.

int32_t()

Overrides the cast operator to convert fp16_t to int32_t.

uint32_t()

Overrides the cast operator to convert fp16_t to uint32_t.

int64_t()

Overrides the cast operator to convert fp16_t to int64_t.

uint64_t()

Overrides the cast operator to convert fp16_t to uint64_t.

bool()

Overrides the cast operator to convert fp16_t to bool.

IsInf()

Checks whether the fp16_t value is infinite. 1 indicates positive infinity, and -1 indicates negative infinity. In other cases, 0 is returned.

toFloat()

Converts fp16_t to float (fp32).

toDouble()

Converts fp16_t to double (fp64).

toInt8()

Converts fp16_t to int8_t.

toUInt8()

Converts fp16_t to uint8_t.

toInt16()

Converts fp16_t to int16_t.

toUInt16()

Converts fp16_t to uint16_t.

toInt32()

Converts fp16_t to int32_t.

toUInt32()

Converts fp16_t to uint32_t.

ExtractFP16(const uint16_t &val, uint16_t *s, int16_t *e, uint16_t *m)

Extracts the sign, exponent, and mantissa of the fp16_t object.

ReverseMan(bool negative, T *man)

Calculates the two's complement of the mantissa when the sign bit is negative.

MinMan(const int16_t &ea, T *ma, const int16_t &eb, T *mb)

Shifts the mantissa with an exponent smaller than another exponent right.

RightShift(T man, int16_t shift)

Shifts the mantissa bits right.

GetManSum(int16_t ea, const T &ma, int16_t eb, const T &mb)

Obtains the mantissa sum of two fp16_t numbers. The supported types (T) are uint16_t, uint32_t, and uint64_t.

ManRoundToNearest(bool bit0, bool bit1, bool bitLeft, T man, uint16_t shift = 0)

Rounds the mantissa of fp16_t or float to the nearest value.

GetManBitLength(T man)

Obtains the bit length of the mantissa of a floating point number.

isnan(op::fp16_t value)

Checks whether the value is not a number (NaN).