Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Understanding Sentences from Multiple Perspectives with Multi-Head Attention

Multi-Head Attention is a structure that performs Self-Attention multiple times simultaneously to better understand the various relationships within a sentence.

Even though a single Self-Attention mechanism can identify crucial relationships between words, capturing complex contexts with just one perspective can be challenging.

Therefore, a method involving multiple parallel Self-Attentions was introduced, allowing the model to view the sentence from different angles.


Example of Multi-Head Attention

Let's consider the sentence "The student is sitting at the desk reading a book."

Multi-Head Attention understands this sentence from various perspectives, such as:

  • Attention 1: Focusing on the relationship between student and sitting

  • Attention 2: Focusing on the relationship between book and reading

By applying multiple perspectives simultaneously, Multi-Head Attention enriches the understanding of the sentence's meaning.


How Does Multi-Head Attention Work?

  1. The input sentence is duplicated and passed into multiple Self-Attention structures.

  2. Each structure independently uses different weights to calculate relationships between the words.

  3. The output from all structures is then combined.

  4. Finally, this consolidated information is used to represent the sentence's meaning.

Through this process, the model considers diverse information and relationships simultaneously, enabling it to create more accurate representations.


Multi-Head Attention is a crucial component that helps Transformer models understand sentences more precisely.

In the next lesson, we'll apply what we've learned so far to solve a simple quiz.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.