Skip to main content
Practice

GPT: The Language Model That Changed the World

GPT Logo

Some people remember November 30, 2022, the day ChatGPT was first released. One million people signed up within the first five days, and it surpassed 100 million monthly active users within two months. It was the fastest-growing consumer service in internet history. For the first time, people had natural conversations with AI, and the way we write and code, long considered exclusively human domains, began to change.

But ChatGPT did not appear out of nowhere. It is the result of more than five years of research, false starts, and improvement since OpenAI first released GPT in 2018. In this chapter, we will look at how GPT works and what journey brought it to its current form.

AI Before GPT: RNN

To properly understand GPT, we first need to know how AI handled language before GPT existed.

Until the mid-2010s, the dominant architecture in natural language processing was the RNN (Recurrent Neural Network).

Simply put, an RNN is "a neural network that reads a sentence one word at a time, carrying what it has read so far in a 'memo' (state) and passing it to the next word." The word "recurrent" refers to the fact that the input does not pass through just once. The memo created at the previous step is combined back into the input of the next step and used again.

For example, when processing "I went to school today," an RNN proceeds in the order I → went → to → school → today, updating its memo at every step with "what the flow of the sentence has been so far."

RNNs were once used as a standard for data where order matters, such as sentences and speech, because their structure naturally suited "what came before influences what comes after."

But RNNs had a fundamental limitation. As sentences grew longer, earlier information faded, and it became difficult to capture relationships between words far apart. For example, in the sentence "Yesterday my friend took the train all the way from Paris to Marseille, so what food does that friend like?" by the time an RNN reached the end of the sentence, it had almost lost the earlier information about "friend."

GPT set out from an entirely different architecture to overcome this limitation.

The Transformer: The Engine That Drives GPT

The core of GPT is the Transformer architecture introduced earlier. First proposed in the 2017 Google research paper "Attention Is All You Need," this architecture changed the paradigm of language AI.

The key idea of the Transformer is the attention mechanism. Instead of processing a sentence one word at a time in order, it views the entire sentence at once and simultaneously calculates how related each word is to every other word.

An analogy helps here. If an RNN is a reader who reads a book from beginning to end in sequence, a Transformer is a reader who spreads the entire book open at once and draws lines to the important parts with a highlighter. It grasps the relationships ("this word connects to that word") across the whole sentence at the same time.

As a result, in the sentence "Yesterday my friend took the train all the way from Paris to Marseille, so what food does that friend like?" the Transformer directly connects the "friend" at the end of the sentence with the "friend" from earlier. Distance does not cause it to lose track of relationships or context.

GPT is built on this Transformer architecture, and the name itself, Generative, Pre-trained, Transformer, describes the structure.

How Has GPT Evolved?

GPT was not completed in a single release. In its early form, it was close to "a model that plausibly extends long text." As versions progressed, it grew better at following instructions precisely, holding longer context more stably, and completing tasks by connecting to tools. Below, we trace the progression from GPT-1 onward, focusing on what changed from the user's perspective.

GPT-1: Proof of Concept (2018)

GPT-1, released in 2018, had 117 million parameters. Parameters are the total number of internal numerical values a model adjusts through training. They are analogous to the number of synaptic connections in a brain: a larger number means more patterns can be remembered and processed.

GPT-1's achievement lay more in approach than in the technology itself. It demonstrated that first learning patterns from vast amounts of text (pre-training), then fine-tuning for a specific task (fine-tuning), was effective across multiple language processing tasks. This approach, which we now take for granted, was first applied in earnest to language models by GPT-1.

GPT-2: The Model They Hesitated to Release (2019)

GPT-2 had 1.5 billion parameters, thirteen times GPT-1. Performance improved substantially, but this model attracted attention for another reason. OpenAI decided not to release the full model initially.

The reason was that GPT-2 could generate fake text that was far too convincing. There were concerns it could be misused to fabricate news articles or mimic the speaking style of specific individuals. It was eventually released in stages over several months and became a catalyst for industry discussion about AI safety.

GPT-3: More Powerful (2020)

GPT-3 had 175 billion parameters, over 100 times GPT-2. But the change went beyond sheer scale.

Without any separate fine-tuning, GPT-3 could perform new tasks when shown just a few examples. This is called few-shot learning. For example, given two or three examples like "translate this sentence into French: Hello → Bonjour, Good morning → Bonjour, How are you → ?", it could figure out the translation pattern and respond, with no additional training. From around this point, one model became capable of writing code, drafting text, summarizing, and translating.

GPT-3.5: The Birth of ChatGPT (2022)

GPT-3.5 was an improved version of GPT-3, specialized for natural conversation. The GPT-3.5 model was released alongside the ChatGPT service in November 2022, opening an era in which the general public could have natural conversations with AI and use it for work.

GPT-3.5 better understood conversational context, followed instructions more precisely, and could sustain much longer conversations more stably than previous versions.

GPT-4: Multimodal and Enhanced Reasoning (2023)

GPT-4, announced in 2023, differed from its predecessors in two key ways. OpenAI did not disclose the parameter count, but the direction of change was clear.

The first was multimodal input. It became capable of accepting not only text but also images. For the first time, requests like "find what's strange about this photo" or "explain what this graph is showing" became possible.

The second was enhanced reasoning. Where GPT-3 excelled at quickly producing "answers that sound correct," GPT-4 showed a marked improvement on problems requiring multi-step logical reasoning. It recorded noticeably different scores from the previous version on the bar exam, medical licensing exam, and math problems at the level of college entrance exams.

GPT-4o: AI That Sees, Hears, and Speaks (2024)

Released in May 2024, the "o" in GPT-4o stands for "omni." Where previous versions handled text and images, GPT-4o integrates text, images, and audio within a single model.

The most visible change was voice conversation. GPT-4o can hear what a person says, grasp the context, and respond immediately in a natural voice. Where prior voice features converted text to speech, GPT-4o understands and processes sound itself. It can detect changes in intonation and emotion to some degree. Response speed improved significantly compared to before, and performance in non-English languages also improved.

GPT-5: Reasoning and Speed in One (2025)

Released in August 2025, GPT-5 is a model that unifies the previously separate GPT series and reasoning-dedicated models (o1, o3) into one.

Previously, users had to choose for themselves: GPT-4o for fast answers, o1 or o3 for working through complex problems in depth. GPT-5 handles this selection automatically inside the model. It responds quickly to simple questions; when it determines that complex reasoning is needed, it thinks longer before producing an answer. If a user requests "please think about this slowly and thoroughly," the model operates accordingly.

Performance also improved substantially over previous versions. It recorded top-tier scores on benchmarks in mathematics, coding, science, and medicine. In particular, hallucinations decreased by approximately 45% compared to GPT-4o. OpenAI CEO Sam Altman described GPT-5 as "a model that provides doctoral-level capabilities across a broad range of tasks."

GPT-5 integrated router system diagram: automatically judges question complexity and selects between fast response mode and deep reasoning mode

GPT-5 is available to all ChatGPT users for free, but paid subscribers can access higher usage limits along with more powerful reasoning modes.

VersionReleasedParametersKey Changes
GPT-12018117MPre-training + fine-tuning approach introduced
GPT-220191.5BConvincing text generation, staged release
GPT-32020175BFew-shot learning, general-purpose language model
GPT-3.52022UndisclosedConversation-specialized, released with ChatGPT
GPT-42023UndisclosedMultimodal, enhanced reasoning
GPT-4o2024UndisclosedText/image/audio integration, real-time voice
GPT-52025UndisclosedUnified reasoning/non-reasoning, auto-routing, fewer hallucinations

Models After GPT-5.0

Recently, updates within the "GPT-5" umbrella have continued in sub-version increments. Coding-related tasks have branched into the Codex lineup, and development is advancing toward stronger agentic (agent-based, task-executing) capabilities.

VersionReleaseKey Changes
GPT-5.12025-11Enhanced conversational tone and personalization; easier to adjust to preferred styles
GPT-5.22025-12Stronger long-context understanding and tool-call-based execution
GPT-5.2-Codex2025-12Targets more stable agentic execution for coding tasks (editing, refactoring, testing)
GPT-5.3-Codex2026-02Optimized to handle long-context coding tasks more smoothly

ChatGPT Changed How People Work

When GPT-3 was released, the tech industry took notice. But it was not until November 2022, when ChatGPT launched, that this technology entered the everyday lives of ordinary people.

ChatGPT is a conversational interface built on GPT-3.5. Technically it was not a completely new model; it added conversation-specialized training on top of the existing model. The key was RLHF: human evaluators reviewed model responses and chose which ones were more natural and helpful, refining conversational ability.

Through this process, the GPT model transformed from "an AI model for research purposes" into "a general-purpose service anyone could converse with in a browser."

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.