We are exploring the use of cuTile, an emerging DSL from NVIDIA, to implement high-performance linear attention kernels. cuTile provides a tile-based programming model in Python that is close to the ...