Schedule Synthesis for Halide Pipelines through Reuse Analysis

Efficient code generation for image processing applications continues to pose a challenge in a domain where high performance is often necessary in order to meet real time constraints. The inherently complex structure found in most image processing pipelines, the plethora of transformations that can be applied in order to optimize the performance of an implementation, as well as the interaction of these optimizations with locality, redundant computation and parallelism, can be indentified as the key reasons behind this issue. Recent domain- specific-languages (DSL) such as the Halide DSL and compiler attempt to encourage high-level design-space exploration in order to facilitate the optimization process. We propose a novel optimization strategy that aims to maximize producer-consumer locality by exploiting reuse in image processing pipelines. We implement our analysis as a tool that can be used alongside the Halide DSL in order to automatically generate schedules for pipelines implemented in Halide and test it on a variety of benchmarks. Experimental results on three different multi-core architectures show an average performance improvement of 40% over the Halide Auto-Scheduler and 75% over a state-of-the art approach that targets the PolyMage DSL.