How Generative AI Understands Prompts
As introduced earlier, AI is a Function
.
However, unlike simple functions like f(x) = 3x + 2 that we learned in math, these are extremely complex functions like f(countless variables) = wide range of output possibilities
that are difficult for ordinary people to understand.
Similar to how human intelligence emerges from the brain, AI's intelligence comes from models composed of complex functions.
AI models learn from data to create neurons (neuronal cells of the brain) and use this to solve given problems.
The neurons in recently released generative AI use a model called
Transformer
.
Transformers analyze the given prompt by breaking it down into sub-units like words and tokens, and predict the next word probabilistically to generate a sentence.
The process of how AI understands a prompt can be divided into four major stages.
1. Tokenization
A token
is a small unit that splits a sentence into words, punctuation, numbers, etc. When AI receives the prompt "The cat climbed the tree," it tokenizes this sentence.
The / cat / climbed / the / tree
Each token helps AI to find meaning from the data it has been trained on and to grasp the context of the sentence. Tokens can be defined slightly differently depending on the AI model, but they are generally defined as follows.
English Tokenization
English tokenization primarily uses spaces or punctuation to separate words.
Example: "The quick brown fox jumps over the lazy dog."
Tokenizing this sentence breaks it into 10 tokens as follows:
- The
- quick
- brown
- fox
- jumps
- over
- the
- lazy
- dog
- .
Here, each word and punctuation mark becomes a token.
Even a single word can be split into several tokens based on prefixes, patterns, and suffixes. For example, the word "unconscious" can be split into sub-elements such as un (a prefix indicating negation), con (a common pattern in English words), and ious (a frequent English suffix), resulting in the recognition as three tokens.
Tokenizing Other Languages
Tokenizing languages like Korean can be more complex. Due to the abundance of postpositions and verb endings, it often uses morpheme-based tokenization (the smallest meaning unit in a language) instead of simple word-based tokenization.
Example: "I was reading a book at the library."
This sentence can be tokenized into 11 tokens as follows:
- I (pronoun)
- was (verb)
- reading (verb)
- a (article)
- book (noun)
- at (preposition)
- the (article)
- library (noun)
Usually, even with the same number of words, tokenization in a language like Korean involves more tokens than English.
The way tokens are processed varies depending on the type of characters being handled by the AI model. ChatGPT typically allocates 1 token per 1-4 English letters, while handling languages like Korean at a morpheme level.
Note: Most text generation AIs, like ChatGPT, calculate cost based on the number of input and output tokens. Thus, reducing unnecessary tokens is crucial.
2. Embedding
The tokenized words are converted into numeric vectors
. For example, the word cat might be converted into a vector like:
[0.11, 0.34, 0.56, ...]
Words with similar meanings have similar vector values. For instance, the vector value for dog can be similarly converted as:
[0.12, 0.84, 0.32, ...]
Words with similar vector values are closely positioned in the vector space.
3. Context Understanding
AI understands the context of a sentence by using the vector values of the tokenized words. For instance, when the words cat and tree appear together, AI identifies the kind of relationship these two words have.
This involves using an Attention Mechanism
, which calculates how each word in a sentence is connected to others, giving higher weight to more important words.
The Transformer
model, a type of AI neural network model, utilizes the attention mechanism to identify relationships between all words simultaneously. For example, it understands how cat and tree interact and identifies cat as the subject performing the action of climbing.
ChatGPT, based on the Transformer model, uses the attention mechanism to grasp the context of the prompt.
4. Generating Responses
AI predicts the first word based on the input vectors. In this process, AI uses its pre-trained language model to select the most appropriate word within the given context, such as "cat" being the first predicted word included in the response.
Once the first word is predicted, AI includes this word in the context and predicts the next word. This process repeats until the response completes, with AI predicting and generating each subsequent word based on all previously generated ones.
-
When predicting the next word after
The cat
, AI selectsis
-
When predicting the next word after
The cat is
, AI selectsa
AI continues to create words until a coherent sentence is constructed in line with the prompt.
The cat is a small animal often kept as a pet at home.
Try It Out
Send a prompt example and compare AI's responses.