Transformer (deep learning)

Transformers use multi-head attention for contextualizing tokens

Transformer (deep learning)

Transformers use multi-head attention for contextualizing tokens

In transformers, each token is contextualized through a multi-head attention mechanism. This allows the model to focus on different parts of the input sequence simultaneously, enhancing the representation of each token by considering its context.

Example

For a sentence like "The cat sat on the mat," each word token (e.g., "cat," "sat," "mat") is contextualized by considering its relationship with other words in the sentence.

Understanding this helps grasp how transformers achieve efficient and effective language modeling.

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews