循环合并
循环合并是一种重要的循环变换技术,其核心作用是通过重构循环结构,减少内存访问次数、降低控制开销、提升数据局部性,并为后续优化铺路,最终在不改变计算结果的前提下提升程序执行效率。
例如,假设有如下两层循环:
1 2 3 | for i in range(N): for j in range(M): C[i, j] = A[i, j] + B[i, j] |
Add是纯elewise操作,所以可以合并成一个线性循环:
1 2 3 4 | for fused in range(N * M): i = fused // M j = fused % M C[i, j] = A[i, j] + B[i, j] |
对应到AscIR表达如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | Load load0("load_0"); load0.x = data0.y; load0.attr.sched.axis = {z0.id, z1.id}; load0.y.axis = {z0.id, z1.id}; load0.y.repeats = {N, M}; load0.y.strides = {M, 1}; load0.attr.hint.compute_type = ComputeType::COMPUTE_LOAD; Load load1("load_1"); load1.x = data1.y; load1.attr.sched.axis = {z0.id, z1.id}; load1.y.axis = {z0.id, z1.id}; load1.y.repeats = {N, M}; load1.y.strides = {M, 1}; load1.attr.hint.compute_type = ComputeType::COMPUTE_LOAD; Add add("add"); add.x1 = load0.y; add.x2 = load1.y; add.attr.sched.axis = {z0.id, z1.id}; add.y.axis = {z0.id, z1.id}; add.y.repeats = {N, M}; add.y.strides = {M, 1}; add.attr.hint.compute_type = ComputeType::COMPUTE_ELEWISE; |
循环合并后:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | Load load0("load_0"); load0.x = data0.y; load0.attr.sched.axis = {z0z1.id}; load0.y.axis = {z0z1.id}; load0.y.repeats = {N * M}; load0.y.strides = {1}; load0.attr.hint.compute_type = ComputeType::COMPUTE_LOAD; Load load1("load_1"); load1.x = data1.y; load1.attr.sched.axis = {z0z1.id}; load1.y.axis = {z0z1.id}; load1.y.repeats = {N * M}; load1.y.strides = {1}; load1.attr.hint.compute_type = ComputeType::COMPUTE_LOAD; Add add("add"); add.x1 = load0.y; add.x2 = load1.y; add.attr.sched.axis = {z0z1.id}; add.y.axis = {z0z1.id}; add.y.repeats = {N * M}; add.y.strides = {1}; add.attr.hint.compute_type = ComputeType::COMPUTE_ELEWISE; |
父主题: Schedule