Schedule Synthesis for Halide Pipelines on GPU

The Halide DSL and compiler have enabled high performance code generation for image processing pipelinestargeting heterogeneous architectures through the separation of algorithmic description and optimizationschedule. However, automatic schedule generation is currently only possible for multi-core CPU architectures.As a result, expert knowledge is still required when optimizing for platforms with GPU capabilities. In thiswork, we extend the current Halide Autoscheduler with novel optimization passes in order to efficientlygenerate schedules for CUDA-based GPU architectures. We evaluate our proposed method across a varietyof applications and show that it can achieve performance competitive with that of manually tuned Halideschedules, or in many cases even better performance. Experimental results show that our schedules are onaverage 10% faster than manual schedules and over 2x faster than previous autoscheduling attempts.