Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Self-Attention

In the sentence "I ate an apple," how should the word “ate” focus on other words?

Self-Attention is a method where words in a sentence compare with each other 'by themselves' to calculate which words to 'attend to'.

Self-Attention computes relationships between words numerically, determining how much each word should focus on other words.

For example, the attention score given by "ate" to each word can be derived as follows.

Self-Attention Example
I      → 0.1
an → 0.1
apple → 0.8
ate → 0.1

The word "ate" is focused on "apple" because identifying "what was eaten" is important.

Through this process, transformers can understand relationships between words in a sentence and comprehend the context.

WordWord to Attend ToReason
INone or appleSubject but no strong link
appleateObject-verb relationship
ateappleIndicates "what was eaten?"

Traditional RNNs processed words sequentially, making it difficult to discern relationships between distantly placed words.

However, Self-Attention compares every word pair at once, considering the entire sentence context.

In the next lesson, we will explore the Multi-Head Attention structure, which uses this Self-Attention mechanism in parallel multiple times.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.