Understanding Flash Attention: Writing the Algorithm from Scratch in Triton | Towards Data Science

Find out how Flash Attention works. Afterward, we’ll refine our understanding by writing a GPU kernel of the algorithm in Triton.

By · · 1 min read
Understanding Flash Attention: Writing the Algorithm from Scratch in Triton | Towards Data Science

Source: Towards Data Science

Find out how Flash Attention works. Afterward, we’ll refine our understanding by writing a GPU kernel of the algorithm in Triton.