昇腾社区首页
中文
注册
开发者
下载

循环合并

循环合并是一种重要的循环变换技术,其核心作用是通过重构循环结构,减少内存访问次数、降低控制开销、提升数据局部性,并为后续优化铺路,最终在不改变计算结果的前提下提升程序执行效率。

例如,假设有如下两层循环:

1
2
3
for i in range(N):
    for j in range(M):
        C[i, j] = A[i, j] + B[i, j]

Add是纯elewise操作,所以可以合并成一个线性循环:

1
2
3
4
for fused in range(N * M):
    i = fused // M
    j = fused % M
    C[i, j] = A[i, j] + B[i, j]

对应到AscIR表达如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Load load0("load_0");
load0.x = data0.y;
load0.attr.sched.axis = {z0.id, z1.id};
load0.y.axis = {z0.id, z1.id};
load0.y.repeats = {N, M};
load0.y.strides = {M, 1};
load0.attr.hint.compute_type = ComputeType::COMPUTE_LOAD;

Load load1("load_1");
load1.x = data1.y;
load1.attr.sched.axis = {z0.id, z1.id};
load1.y.axis = {z0.id, z1.id};
load1.y.repeats = {N, M};
load1.y.strides = {M, 1};
load1.attr.hint.compute_type = ComputeType::COMPUTE_LOAD;

Add add("add");
add.x1 = load0.y;
add.x2 = load1.y;
add.attr.sched.axis = {z0.id, z1.id};
add.y.axis = {z0.id, z1.id};
add.y.repeats = {N, M};
add.y.strides = {M, 1};
add.attr.hint.compute_type = ComputeType::COMPUTE_ELEWISE;

循环合并后:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Load load0("load_0");
load0.x = data0.y;
load0.attr.sched.axis = {z0z1.id};
load0.y.axis = {z0z1.id};
load0.y.repeats = {N * M};
load0.y.strides = {1};
load0.attr.hint.compute_type = ComputeType::COMPUTE_LOAD;

Load load1("load_1");
load1.x = data1.y;
load1.attr.sched.axis = {z0z1.id};
load1.y.axis = {z0z1.id};
load1.y.repeats = {N * M};
load1.y.strides = {1};
load1.attr.hint.compute_type = ComputeType::COMPUTE_LOAD;

Add add("add");
add.x1 = load0.y;
add.x2 = load1.y;
add.attr.sched.axis = {z0z1.id};
add.y.axis = {z0z1.id};
add.y.repeats = {N * M};
add.y.strides = {1};
add.attr.hint.compute_type = ComputeType::COMPUTE_ELEWISE;