How Generative AI Understands Prompts
As introduced earlier, AI functions as a mathematical function
.
But unlike simple equations such as f(x) = 3x + 2, AI models operate as extremely complex functions with numerous variables and a vast range of possible outputs.
Just as human intelligence arises from the brain, AI's intelligence comes from models composed of intricate neural networks.
The neural networks in modern generative AI are built on a model known as the
Transformer
.
Transformers analyze prompts by breaking them down into smaller components, such as words or tokens, and then predicting the next most probable word to generate coherent responses. The process by which AI understands a prompt consists of four main stages.
1. Tokenization
A token
is the fundamental unit AI uses to break down input text into smaller components, such as words, punctuation marks, or numbers. For example, when given the sentence "The cat climbed the tree.", AI tokenizes it into:
The / cat / climbed / the / tree
Each token helps AI understand the context and meaning of the input. Although different AI models may define tokens differently, the general process remains consistent.
Tokenizing English Text
English tokenization primarily relies on spaces and punctuation to separate words.
For instance, the sentence "The quick brown fox jumps over the lazy dog." is tokenized into:
- The
- quick
- brown
- fox
- jumps
- over
- the
- lazy
- dog
- .
Here, each word and punctuation mark becomes a token.
Even a single word can be split into several tokens based on prefixes, patterns, and suffixes. For example, the word "unconscious" can be split into sub-elements such as un (a prefix indicating negation), con (a common pattern in English words), and ious (a frequent English suffix), resulting in the recognition as three tokens.
Tokenizing Other Languages
Tokenizing languages like Korean can be more complex. Due to the abundance of postpositions and verb endings, it often uses morpheme-based tokenization (the smallest meaning unit in a language) instead of simple word-based tokenization.
Example: "I was reading a book at the library."
This sentence can be tokenized into 11 tokens as follows:
- I (pronoun)
- was (verb)
- reading (verb)
- a (article)
- book (noun)
- at (preposition)
- the (article)
- library (noun)
Usually, even with the same number of words, tokenization in a language like Korean involves more tokens than English.
The way tokens are processed varies depending on the type of characters being handled by the AI model. ChatGPT typically allocates 1 token per 1-4 English letters, while handling languages like Korean at a morpheme level.
Note: AI models like ChatGPT process text based on tokens, and usage costs are often calculated per token. Efficient tokenization can reduce unnecessary computational expenses.
2. Embedding
Once text has been tokenized, each token is converted into a numeric vector
. For example, the word "cat" might be represented as:
[0.11, 0.34, 0.56, ...]
Words with similar meanings have similar vector values. For instance, the vector value for dog can be similarly converted as:
[0.12, 0.84, 0.32, ...]
Words with similar vector values are closely positioned in the vector space.
3. Context Understanding
AI understands the context of a sentence by using the vector values of the tokenized words. For instance, when the words cat and tree appear together, AI identifies the kind of relationship these two words have.
This involves using an Attention Mechanism
, which calculates how each word in a sentence is connected to others, giving higher weight to more important words.
The Transformer
model, a type of AI neural network model, utilizes the attention mechanism to identify relationships between all words simultaneously. For example, it understands how cat and tree interact and identifies cat as the subject performing the action of climbing.
ChatGPT, based on the Transformer model, uses the attention mechanism to grasp the context of the prompt.
4. Generating Responses
Once AI understands the context, it begins predicting and generating words one at a time. It selects the most appropriate next word based on the previous words and its pre-trained language model.
For example, given the phrase "The cat", AI predicts the most probable next word. If "is" is determined to be the most likely word, AI selects it.
-
When predicting the next word after
The cat
, AI selectsis
-
When predicting the next word after
The cat is
, AI selectsa
AI continues to create words until a coherent sentence is constructed in line with the prompt.
The cat is a small animal often kept as a pet at home.
Try It Out
Experiment with different prompts and observe how AI processes and generates responses based on context and structure.