内置数据类型

SIMD与SIMT混合编程提供了一系列适用于Device侧的数据类型，包括标量、短向量和dim3结构体。

标量类型

标量数据类型覆盖布尔型（bool）、整型（uint8/int8到uint64/int64）、浮点型（float8_e4m3、float8_e5m2、hifloat8、half、bfloat16、float）。

表1 标量数据类型
类型	数据类型	描述	Size（bit）	取值范围
布尔型	bool	全0代表false，否则代表true。	8	true, false
整形	uint8_t	unsigned char	8	[0, 255]
	int8_t	signed char	8	[-128, 127]
	uint16_t	unsigned short	16	[0, 65535]
	int16_t	signed short	16	[-32768, 32767]
	uint32_t	unsigned int	32	[0, 4294967295]
	int32_t	signed int	32	[-2147483648, 2147483647]
	uint64_t	unsigned long	64	[0,18446744073709551615]
	int64_t	signed long	64	[-9223372036854775808, 9223372036854775807]
浮点型	float8_e4m3_t	符号位宽1，指数位宽4，尾数位宽3	8	[2⁶ - 2⁹, 2⁹ - 2⁶]
	float8_e5m2_t	符号位宽1，指数位宽5，尾数位宽2	8	[2¹³ - 2¹⁶, 2¹⁶ - 2¹³]
	hifloat8_t	符号位宽1，点域位宽2，指数与尾数位宽由点域编码决定	8	点域编码决定数据精度与取值范围
	half	符号位宽1，指数位宽5，尾数位宽10	16	[2⁵ - 2¹⁶, 2¹⁶ - 2⁵]
	bfloat16_t	符号位宽1，指数位宽8，尾数位宽7	16	[2¹²⁰ - 2¹²⁸, 2¹²⁸ - 2¹²⁰]
	float	符号位宽1，指数位宽8，尾数位宽23	32	[2¹⁰⁴ - 2¹²⁸, 2¹²⁸ - 2¹⁰⁴]

短向量类型

短向量类型是一种在SIMD与SIMT混合编程模型中提供的固定长度向量类型，用于简化向量数据的表示和操作。该类型适用于处理包含多个分量的数据，如坐标、颜色、向量运算等。

内存特点：
紧凑存储：短向量类型在内存中连续存储，无填充。

对齐要求：遵循自然对齐原则，提升访问效率。

跨线程共享：可存储在Unified Buffer中供线程块内共享。

直接内存访问：支持直接从Global Memory加载和存储。

应用场景：
颜色处理：RGB/RGBA颜色值的操作。

向量运算：物理模拟、图形渲染中的向量计算。

数据打包：将多个相关值打包处理。

内存访问优化：通过向量化提升内存带宽利用率。
短向量变量访问：
变量通过.x、.y、.z、.w的方式进行访问。

当前已支持的短向量数据类型如下：

表2 短向量数据类型
元素数据类型	Vector X2	Vector X4
unsigned char	uchar2	uchar4
signed char	char2	char4
unsigned short (16bit)	ushort2	ushort4
signed short (16bit)	short2	short4
unsigned int	uint2	uint4
signed int	int2	int4
无符号的长整型 (64bit)	ulonglong2	ulonglong4
有符号的长整型 (64bit)	longlong2	longlong4
无符号的长整型 (32bit)	ulong2	ulong4
有符号的长整型 (32bit)	long2	long4
浮点型，1符号位，2指数位，1尾数位	float4_e2m1x2_t	-
浮点型，1符号位，1指数位，2尾数位	float4_e1m2x2_t	-
浮点型，1符号位，4指数位，3尾数位	float8_e4m3x2_t	-
浮点型，1符号位，5指数位，2尾数位	float8_e5m2x2_t	-
浮点型 hif8	hifloat8x2_t	-
浮点型，1符号位，5指数位，10尾数位	half2	-
浮点型，1符号位，8指数位，7尾数位	bfloat16x2_t	-
浮点型，1符号位，8指数位，23尾数位	float2	float4

每种短向量的内存大小与地址对齐大小如下：

表3 短向量数据类型内存大小
数据类型	内存大小（字节）	地址对齐（字节）
char2, uchar2	2	2
char4, uchar4	4	4
short2, ushort2	4	4
short4, ushort4	8	8
int2, uint2	8	8
int4, uint4	16	16
long2, ulong2	8	8
long4, ulong4	16	16
longlong2, ulonglong2	16	16
longlong4, ulonglong4	32	32
float2	8	8
float4	16	16
float4_e2m1x2_t, float4_e1m2x2_t	1	1
float8_e4m3x2_t, float8_e5m2x2_t, hifloat8x2_t	2	2
half2, bfloat16x2_t	4	4

SIMD与SIMT混合编程提供了用于构造短向量的函数，函数列表如表4所示。这些构造函数可以将固定个数的同类型标量值组合成一个短向量类型，如make_int2函数功能为将两个int类型的标量作为输入，组合成一个int2类型的短向量类型作为输出。

表4 向量类型构造函数
函数名	函数原型
make_int2	inline int2 make_int2(int x, int y)
make_int4	inline int4 make_int4(int x, int y, int z, int w)
make_uint2	inline uint2 make_uint2(unsigned int x, unsigned int y)
make_uint4	inline uint4 make_uint4(unsigned int x, unsigned int y, unsigned int z, unsigned int w)
make_ulonglong2	inline ulonglong2 make_ulonglong2(unsigned long long int x, unsigned long long int y)
make_ulonglong4	inline ulonglong4 make_ulonglong4(unsigned long long int x, unsigned long long int y, unsigned long long int z, unsigned long long int w)
make_longlong2	inline longlong2 make_longlong2(long long int x, long long int y)
make_longlong4	inline longlong4 make_longlong4(long long int x, long long int y, long long int z, long long int w)
make_ulong2	inline ulong2 make_ulong2(unsigned long int x, unsigned long int y)
make_ulong4	inline ulong4 make_ulong4(unsigned long int x, unsigned long int y, unsigned long int z,unsigned long int w)
make_long2	inline long2 make_long2(long int x, long int y)
make_long4	inline long4 make_long4(long int x, long int y, long int z, long int w)
make_float2	inline float2 make_float2(float x, float y)
make_float4	inline float4 make_float4(float x, float y, float z, float w)
make_short2	inline short2 make_short2(short x, short y)
make_short4	inline short4 make_short4(short x, short y, short z, short w)
make_ushort2	inline ushort2 make_ushort2(unsigned short x, unsigned short y)
make_ushort4	inline ushort4 make_ushort4(unsigned short x, unsigned short y, unsigned short z,unsigned short w)
make_uchar2	inline uchar2 make_uchar2(unsigned char x, unsigned char y)
make_uchar4	inline uchar4 make_uchar4(unsigned char x, unsigned char y, unsigned char z, unsigned char w)
make_char2	inline char2 make_char2(signed char x, signed char y)
make_char4	inline char4 make_char4(signed char x, signed char y, signed char z, signed char w)
make_half2	inline half2 make_half2(half x, half y)
make_bfloat162	inline bfloat16x2_t make_bfloat162(bfloat16_t x, bfloat16_t y)

使用短向量构造函数需要包含simt_api/vector_functions.h，调用示例如下：

#include "simt_api/vector_functions.h"
__simt_vf__ __launch_bounds__(1024) inline void kernel_make_int2(__gm__ int2* dst, __gm__ int* x, __gm__ int* y)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    dst[idx] = make_int2(x[idx], y[idx]);
}

dim3

用于指定和获取线程网格（Grid）、线程块（Thread Block）在x、y、z维度上的内置结构体。

dim3由3个无符号整数组成，结构体定义为{dimx，dimy，dimz}，用于指定3个不同维度的大小，三维总数为dimx * dimy * dimz。开发者可以通过如下方式创建dim3结构。

dim3(x); // 创建一维结构，dimy和dimz为默认值1
dim3(x, y); // 创建二维结构，dimz为默认值1
dim3(x, y, z); // 创建三维结构

父主题： 扩展语法