In AI, All Information Is Represented as Numbers

To AI, an image is not a "photo." It is an array of numbers. Imagine looking at a photo of a cat. A human sees fur, eyes, and ears. A computer, however, sees that image as nothing more than a collection of tiny dots. Each one of those dots is called a pixel.

Information as numbers

What Is a Pixel?

A pixel is the smallest unit of an image. A digital screen is made up of countless pixels. For example, an image that is 1,000 pixels wide and 1,000 pixels tall consists of one million tiny dots forming a single picture.

Each pixel is more than a dot; it carries color information expressed as numbers.

For a grayscale image, each pixel is typically represented by a single number between 0 and 255.

0 → pure black
255 → pure white
128 → medium brightness

In other words, a grayscale image can be expressed as a table of numbers like this:

5x5 Grayscale Image in Numbers
Example: 5x5 grayscale image

| 0   | 0   | 0   | 0   | 0   |
| --- | --- | --- | --- | --- |
| 0   | 255 | 255 | 255 | 0   |
| 0   | 255 | 0   | 255 | 0   |
| 0   | 255 | 255 | 255 | 0   |
| 0   | 0   | 0   | 0   | 0   |

The table above shows a simple shape where the center is black (0) surrounded by white (255), with all edges in black.

5x5 Grayscale Pixel Image

How Are Color Images Represented?

Color images are one step more complex. Each pixel represents three color values, R (red), G (green), and B (blue), each as a number between 0 and 255. This is called the RGB system. The RGB values indicate the brightness intensity of red, green, and blue respectively.

For example:

(255, 0, 0) → red
(0, 255, 0) → green
(0, 0, 255) → blue
(255, 255, 255) → white
(0, 0, 0) → black
(255, 255, 0) → yellow

A 5x5 color image can be expressed like this:

5x5 Color Image in RGB
Example: 5x5 color image

| (255, 0, 0)     | (0, 255, 0)   | (0, 0, 255)   | (255, 255, 255) | (0, 0, 0)       |
| --------------- | ------------- | ------------- | --------------- | --------------- |
| (255, 255, 0)   | (0, 255, 255) | (255, 0, 255) | (128, 128, 128) | (64, 64, 64)    |
| (192, 192, 192) | (128, 0, 0)   | (0, 128, 0)   | (0, 0, 128)     | (128, 128, 0)   |
| (0, 128, 128)   | (128, 0, 128) | (64, 64, 64)  | (192, 192, 192) | (255, 255, 255) |
| (0, 0, 0)       | (255, 0, 0)   | (0, 255, 0)   | (0, 0, 255)     | (255, 255, 0)   |

The pixel table above represents an image with a variety of mixed colors, as shown below.

5x5 Color Pixel Image

This is a very small image, but real images that AI processes are usually 1,024×1,024 pixels or larger. That means a single photo is made up of millions of pixels. Since each pixel has three color values, AI ultimately handles combinations of millions of numbers.

A cat image represented by millions of pixels

A photo of a cat like this is, to AI, ultimately just a matrix of millions of numbers.

How Does AI Use These Numbers?

AI does not make judgments by understanding meaning, such as "this animal is a cat." Instead, it learns the numerical patterns that appear repeatedly in cat images.

For example, suppose AI is given hundreds of thousands of cat photos. It discovers the following recurring patterns in the data:

Certain brightness changes appear at regular intervals.
Two dark dots frequently appear near a rounded shape.
A triangular brightness pattern at the top repeats consistently.

In other words, AI does not recognize the concept of "ears" or "eyes" the way humans do. Instead, it learns the recurring structure of pixel-based numerical arrays that appear in cat images. And the more these patterns appear, the more precisely it calculates the probability that something is a cat.

Sound Is Also Represented as Numbers

To AI, sound is ultimately a stream of numbers.

Sound is vibration produced by air moving. A microphone measures this vibration at very short time intervals and records how strongly the air vibrated at each moment as a number. This converts sound into a long array of numbers.

Sound Waveform as a Number Array
Example: Partial sound waveform

[0.02, 0.15, 0.30, 0.10, -0.05, -0.20, -0.10, 0.05 ...]

These numbers show how strongly the sound vibrated over time. A large value means a loud sound; a small value means a quiet one.

A human hears this and understands "hello." AI, however, sees the patterns of change in these numbers.

For example, the sound "hel-" always produces a similar numerical flow. The sound "-lo" does the same. By processing vast amounts of audio, AI learns which number patterns frequently appear alongside which words.

AI does not "listen" to sound. It analyzes the numerical patterns of sound that change over time and predicts the corresponding word.

Just as with images, sound is ultimately represented as numbers, and AI works by finding recurring structures within those numbers.

How Does Text Become Numbers?

Just as images and sounds are represented as numbers, text must also be converted to numbers before AI can process it. Computers cannot directly understand the "meaning" of letters. First, the text is broken into small pieces, and each piece is assigned a number.

Step 1: Breaking Text into Pieces (Tokenization)

Take the following sentence as an example:

"The weather feels nice today."

AI does not process this sentence as-is. It first breaks it into word or character units. These pieces are called tokens.

Example:

The
weather
feels
nice
today

Or broken down even further:

This process of splitting a sentence into small units is called tokenization.

Step 2: Assigning Numbers to Tokens

Next, each token is assigned a unique number. For example:

today → 1023
weather → 4581
nice → 9002

The sentence then becomes:

Sentence Token ID Example

[1023, 4581, 9002]

The sentence is now a number array.

Step 3: Converting Numbers into Vectors

But simple numbers alone are not enough. AI represents each word using a larger collection of numbers. This is called an embedding.

For example, the word "cat" might be represented as a group of numbers like this:

Word Embedding Vector Example

[0.12, -0.44, 0.88, 0.03, ...]

This group of numbers captures the characteristics of the word. Interestingly, words with similar meanings or in the same category end up with similar number arrays.

For example, since "cat" and "dog" belong to the same category, they might be represented in AI's learned number space like this:

cat → [0.12, -0.44, 0.88, ...]
dog → [0.10, -0.40, 0.85, ...]

But "car" belongs to a completely different category, so its number array looks quite different:

car → [-0.60, 0.81, -0.12, ...]

How Does AI Use These Numbers for Text?

A human reads a sentence and understands its meaning. AI, however, processes it like this:

Split the sentence into small pieces.
Convert each piece into numbers.
Analyze the number patterns to predict the next word.

AI does not "read" text. It predicts the next number based on patterns in the number-converted words.

Key Takeaway

Images, sounds, and text look like completely different kinds of information on the surface, but inside AI they are all data converted into numbers. What AI does is ultimately find the recurring patterns and relationships among those numbers.

What we see	What AI processes
A photo of a cat	A number array of millions of pixels
A human voice	A continuous series of vibration values over time
A sentence	A number vector converted from tokens

AI does not directly feel or understand meaning. Instead, it calculates the relationships between numbers and uses those relationships to predict the next result. But when this calculation becomes sufficiently refined, we begin to feel as though AI has "understood" the situation.

So How Does AI Actually Work?

AI does not feel emotions or have self-awareness. And yet, we often feel that AI is quite intelligent. The reason is that much of what we call "intelligent activity" in daily life is actually about organizing information, recognizing recurring structures, and predicting the next outcome. In other words, human intellectual activity overlaps considerably with the way AI operates.

Take summarization, for example. It is not simply about shortening a text. It involves reading a long piece, judging what is essential, and rearranging the important content into a coherent flow, all while identifying the structure of the text, discarding less important parts, and preserving the central idea. Translation is not mechanical word substitution either. It requires understanding an expression and its context in one language, then rendering it naturally in the structure of another. Recommendation systems work similarly, not just recording "this person likes this" but analyzing past choice patterns to predict what they are likely to choose next.

On the surface, summarization, translation, and recommendations look like completely different activities. But all of them involve finding recurring structures in vast amounts of information and using those structures to produce the next result.

This is where AI shows a distinct advantage over humans. AI calculates patterns from massive datasets and predicts with great precision the likelihood that those patterns will appear again.

AI does not think the same way humans do, but through this probabilistic pattern analysis, it has reached a level where it can produce results that match, or even exceed, human intellectual activity.

What Is a Pixel?​

How Are Color Images Represented?​

How Does AI Use These Numbers?​

Sound Is Also Represented as Numbers​

How Does Text Become Numbers?​

Step 1: Breaking Text into Pieces (Tokenization)​

Step 2: Assigning Numbers to Tokens​

Step 3: Converting Numbers into Vectors​

How Does AI Use These Numbers for Text?​

Key Takeaway​

So How Does AI Actually Work?​