Self-Attention
In the sentence "I ate an apple," how should the word “ate” focus on other words?
Self-Attention is a method where words in a sentence compare with each other 'by themselves' to calculate which words to 'attend to'.
Self-Attention computes relationships between words numerically, determining how much each word should focus on other words.
For example, the attention score given by "ate" to each word can be derived as follows.
I → 0.1
an → 0.1
apple → 0.8
ate → 0.1
The word "ate" is focused on "apple" because identifying "what was eaten" is important.
Through this process, transformers can understand relationships between words in a sentence and comprehend the context.
Word | Word to Attend To | Reason |
---|---|---|
I | None or apple | Subject but no strong link |
apple | ate | Object-verb relationship |
ate | apple | Indicates "what was eaten?" |
Traditional RNNs
processed words sequentially, making it difficult to discern relationships between distantly placed words.
However, Self-Attention
compares every word pair at once, considering the entire sentence context.
In the next lesson, we will explore the Multi-Head Attention
structure, which uses this Self-Attention mechanism in parallel multiple times.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.