内置数据类型
SIMD与SIMT混合编程提供了一系列适用于Device侧的数据类型,包括标量、短向量和dim3结构体。
标量类型
标量数据类型覆盖布尔型(bool)、整型(uint8/int8到uint64/int64)、浮点型(float8_e4m3、float8_e5m2、hifloat8、half、bfloat16、float)。
类型 |
数据类型 |
描述 |
Size(bit) |
取值范围 |
|---|---|---|---|---|
布尔型 |
bool |
全0代表false,否则代表true。 |
8 |
true, false |
整形 |
uint8_t |
unsigned char |
8 |
[0, 255] |
int8_t |
signed char |
8 |
[-128, 127] |
|
uint16_t |
unsigned short |
16 |
[0, 65535] |
|
int16_t |
signed short |
16 |
[-32768, 32767] |
|
uint32_t |
unsigned int |
32 |
[0, 4294967295] |
|
int32_t |
signed int |
32 |
[-2147483648, 2147483647] |
|
uint64_t |
unsigned long |
64 |
[0,18446744073709551615] |
|
int64_t |
signed long |
64 |
[-9223372036854775808, 9223372036854775807] |
|
浮点型 |
float8_e4m3_t |
符号位宽1,指数位宽4,尾数位宽3 |
8 |
[26 - 29, 29 - 26] |
float8_e5m2_t |
符号位宽1,指数位宽5,尾数位宽2 |
8 |
[213 - 216, 216 - 213] |
|
hifloat8_t |
符号位宽1,点域位宽2,指数与尾数位宽由点域编码决定 |
8 |
点域编码决定数据精度与取值范围 |
|
half |
符号位宽1,指数位宽5,尾数位宽10 |
16 |
[25 - 216, 216 - 25] |
|
bfloat16_t |
符号位宽1,指数位宽8,尾数位宽7 |
16 |
[2120 - 2128, 2128 - 2120] |
|
float |
符号位宽1,指数位宽8,尾数位宽23 |
32 |
[2104 - 2128, 2128 - 2104] |
短向量类型
短向量类型是一种在SIMD与SIMT混合编程模型中提供的固定长度向量类型,用于简化向量数据的表示和操作。该类型适用于处理包含多个分量的数据,如坐标、颜色、向量运算等。
- 内存特点:
对齐要求:遵循自然对齐原则,提升访问效率。
跨线程共享:可存储在Unified Buffer中供线程块内共享 。
直接内存访问:支持直接从Global Memory加载和存储。
- 应用场景:
向量运算:物理模拟、图形渲染中的向量计算。
数据打包:将多个相关值打包处理。
内存访问优化:通过向量化提升内存带宽利用率。
- 短向量变量访问:
当前已支持的短向量数据类型如下:
元素数据类型 |
Vector X2 |
Vector X4 |
|---|---|---|
unsigned char |
uchar2 |
uchar4 |
signed char |
char2 |
char4 |
unsigned short (16bit) |
ushort2 |
ushort4 |
signed short (16bit) |
short2 |
short4 |
unsigned int |
uint2 |
uint4 |
signed int |
int2 |
int4 |
无符号的长整型 (64bit) |
ulonglong2 |
ulonglong4 |
有符号的长整型 (64bit) |
longlong2 |
longlong4 |
无符号的长整型 (32bit) |
ulong2 |
ulong4 |
有符号的长整型 (32bit) |
long2 |
long4 |
浮点型,1符号位,2指数位,1尾数位 |
float4_e2m1x2_t |
- |
浮点型,1符号位,1指数位,2尾数位 |
float4_e1m2x2_t |
- |
浮点型,1符号位,4指数位,3尾数位 |
float8_e4m3x2_t |
- |
浮点型,1符号位,5指数位,2尾数位 |
float8_e5m2x2_t |
- |
浮点型 hif8 |
hifloat8x2_t |
- |
浮点型,1符号位,5指数位,10尾数位 |
half2 |
- |
浮点型,1符号位,8指数位,7尾数位 |
bfloat16x2_t |
- |
浮点型,1符号位,8指数位,23尾数位 |
float2 |
float4 |
每种短向量的内存大小与地址对齐大小如下:
数据类型 |
内存大小(字节) |
地址对齐(字节) |
|---|---|---|
char2, uchar2 |
2 |
2 |
char4, uchar4 |
4 |
4 |
short2, ushort2 |
4 |
4 |
short4, ushort4 |
8 |
8 |
int2, uint2 |
8 |
8 |
int4, uint4 |
16 |
16 |
long2, ulong2 |
8 |
8 |
long4, ulong4 |
16 |
16 |
longlong2, ulonglong2 |
16 |
16 |
longlong4, ulonglong4 |
32 |
32 |
float2 |
8 |
8 |
float4 |
16 |
16 |
float4_e2m1x2_t, float4_e1m2x2_t |
1 |
1 |
float8_e4m3x2_t, float8_e5m2x2_t, hifloat8x2_t |
2 |
2 |
half2, bfloat16x2_t |
4 |
4 |
SIMD与SIMT混合编程提供了用于构造短向量的函数,函数列表如表4所示。这些构造函数可以将固定个数的同类型标量值组合成一个短向量类型,如make_int2函数功能为将两个int类型的标量作为输入,组合成一个int2类型的短向量类型作为输出。
函数名 |
函数原型 |
|---|---|
make_int2 |
inline int2 make_int2(int x, int y) |
make_int4 |
inline int4 make_int4(int x, int y, int z, int w) |
make_uint2 |
inline uint2 make_uint2(unsigned int x, unsigned int y) |
make_uint4 |
inline uint4 make_uint4(unsigned int x, unsigned int y, unsigned int z, unsigned int w) |
make_ulonglong2 |
inline ulonglong2 make_ulonglong2(unsigned long long int x, unsigned long long int y) |
make_ulonglong4 |
inline ulonglong4 make_ulonglong4(unsigned long long int x, unsigned long long int y, unsigned long long int z, unsigned long long int w) |
make_longlong2 |
inline longlong2 make_longlong2(long long int x, long long int y) |
make_longlong4 |
inline longlong4 make_longlong4(long long int x, long long int y, long long int z, long long int w) |
make_ulong2 |
inline ulong2 make_ulong2(unsigned long int x, unsigned long int y) |
make_ulong4 |
inline ulong4 make_ulong4(unsigned long int x, unsigned long int y, unsigned long int z,unsigned long int w) |
make_long2 |
inline long2 make_long2(long int x, long int y) |
make_long4 |
inline long4 make_long4(long int x, long int y, long int z, long int w) |
make_float2 |
inline float2 make_float2(float x, float y) |
make_float4 |
inline float4 make_float4(float x, float y, float z, float w) |
make_short2 |
inline short2 make_short2(short x, short y) |
make_short4 |
inline short4 make_short4(short x, short y, short z, short w) |
make_ushort2 |
inline ushort2 make_ushort2(unsigned short x, unsigned short y) |
make_ushort4 |
inline ushort4 make_ushort4(unsigned short x, unsigned short y, unsigned short z,unsigned short w) |
make_uchar2 |
inline uchar2 make_uchar2(unsigned char x, unsigned char y) |
make_uchar4 |
inline uchar4 make_uchar4(unsigned char x, unsigned char y, unsigned char z, unsigned char w) |
make_char2 |
inline char2 make_char2(signed char x, signed char y) |
make_char4 |
inline char4 make_char4(signed char x, signed char y, signed char z, signed char w) |
make_half2 |
inline half2 make_half2(half x, half y) |
make_bfloat162 |
inline bfloat16x2_t make_bfloat162(bfloat16_t x, bfloat16_t y) |
使用短向量构造函数需要包含simt_api/vector_functions.h,调用示例如下:
1 2 3 4 5 6 | #include "simt_api/vector_functions.h" __simt_vf__ __launch_bounds__(1024) inline void kernel_make_int2(__gm__ int2* dst, __gm__ int* x, __gm__ int* y) { int idx = threadIdx.x + blockIdx.x * blockDim.x; dst[idx] = make_int2(x[idx], y[idx]); } |
dim3
用于指定和获取线程网格(Grid)、线程块(Thread Block)在x、y、z维度上的内置结构体。
1 2 3 | dim3(x); // 创建一维结构,dimy和dimz为默认值1 dim3(x, y); // 创建二维结构,dimz为默认值1 dim3(x, y, z); // 创建三维结构 |