Multi-head Attention is a Fancy Addition Machine | Towards Data Science
“Attention is All you Need” showed attention as a sequence of multiplicative and concat operations but… what if I told you they are additive?

Source: Towards Data Science
“Attention is All you Need” showed attention as a sequence of multiplicative and concat operations but… what if I told you they are additive?