Skip to main content
Practice

The Meaning and Evolution of GPT

GPT stands for Generative Pre-trained Transformer, referring to a generative AI model pre-trained using large datasets, based on the Transformer model introduced by a Google research team in 2017.

Transformer: An AI architecture that assigns weights by learning how important each word in a sentence is relative to others, specialized for parallel computation.


What does the name GPT signify?

  • Generative: The AI model can generate (or create) text.

  • Pre-trained: It has learned from a vast amount of data in advance.

  • Transformer: Utilizes the Transformer-based AI model architecture.


Background of GPT's Inception

Before the introduction of the Transformer architecture in 2017, AI primarily used a rule-based approach or deep learning.


Rule-Based Approach

The rule-based approach involves defining specific rules beforehand and processing data or drawing conclusions based on those rules.

It generates predictable output results for given inputs.

Example of Rule-Based Approach
Identifying the subject and verb in a sentence

Rules:
- In English, the first word in a sentence is likely to be the subject.
- The word following the subject is likely to be the verb.

Input sentence: "The cat sleeps."

Applied rules:
- Identified "The cat" as the subject
- Identified "sleeps" as the verb

The rule-based approach did not work well for inputs outside of defined patterns and had limitations in dealing with the ever-changing realities of natural language processing.


Deep Learning

Deep Learning refers to using artificial neural networks to learn patterns in data and make predictions about new data based on this learning.

Key terms related to deep learning are as follows:


Neural Networks

Neural Networks are computer models that mimic the human brain, structured to process input data and produce an output.

These networks are composed of multiple layers, with each layer processing the input data to extract higher-level information.

Components of these layers include:

  • Input Layer: The layer that receives data

  • Hidden Layers: Multiple intermediate layers that process data and learn patterns

  • Output Layer: The layer that delivers the final output

Training

Training is the process wherein the neural network processes input data and learns patterns.

For example, by showing numerous images of cats and dogs, the network learns how to distinguish between cats and dogs in images.

Key terms include:

  • Dataset: A collection of data used for training (e.g., thousands of images of cats and dogs)

  • Label: Information indicating what each piece of data represents (e.g., cat images labeled as 'cat', dog images labeled as 'dog')

Though deep learning technology is widely used in various fields, it faces inefficiency issues due to the sequential data processing in natural language processing.

For instance, similar to a deep learning model that struggles to remember the earlier parts of a book while reading the latter parts, it tends to forget important information when handling long sentences.


Emergence of the Transformer Model

The Transformer model was designed to reduce processing time for input and output by processing data in parallel and to understand context by considering the relationships between the input data.

With large-scale pre-training on data, GPT demonstrated outstanding performance in natural language processing and has rapidly evolved with version upgrades such as GPT-2, GPT-3, and GPT-4.

Check out the slideshow to see the developmental milestones of major GPT models!