Speed Up PyTorch With Custom Kernels. But It Gets Progressively Darker | Towards Data Science

We’ll begin with torch.compile, move on to writing a custom Triton kernel, and finally dive into designing a CUDA kernel from scratch.

By · · 1 min read
Speed Up PyTorch With Custom Kernels. But It Gets Progressively Darker | Towards Data Science

Source: Towards Data Science

We’ll begin with torch.compile, move on to writing a custom Triton kernel, and finally dive into designing a CUDA kernel from scratch.