Self-Attention

In the sentence "I ate an apple," how does the word “ate” determine which other words to focus on?

Self-Attention is a method where words in a sentence compare with one another to determine which ones they should “attend to.”

It calculates the relationships between words numerically, showing how much attention each word should give to the others.

For example, the attention score given by "ate" to each word can be derived as follows.

Self-Attention Example
I      → 0.1
an     → 0.1
apple  → 0.8
ate    → 0.1

The word "ate" is focused on "apple" because identifying "what was eaten" is important.

Through this process, transformers can understand relationships between words in a sentence and comprehend the context.

Traditional RNNs processed words one at a time, making it hard to understand relationships between words that are far apart in a sentence.

In contrast, Self-Attention examines all word pairs simultaneously, taking the whole sentence context into account.

In the next lesson, we will explore the Multi-Head Attention structure, which uses this Self-Attention mechanism in parallel multiple times.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.