In AI, All Information Is Represented as Numbers
To AI, an image is not a "photo." It is an array of numbers. Imagine looking at a photo of a cat. A human sees fur, eyes, and ears. A computer, however, sees that image as nothing more than a collection of tiny dots. Each one of those dots is called a pixel.

What Is a Pixel?
A pixel is the smallest unit of an image. A digital screen is made up of countless pixels. For example, an image that is 1,000 pixels wide and 1,000 pixels tall consists of one million tiny dots forming a single picture.
Each pixel is more than a dot; it carries color information expressed as numbers.
For a grayscale image, each pixel is typically represented by a single number between 0 and 255.
- 0 → pure black
- 255 → pure white
- 128 → medium brightness
In other words, a grayscale image can be expressed as a table of numbers like this:
Example: 5x5 grayscale image
| 0 | 0 | 0 | 0 | 0 |
| --- | --- | --- | --- | --- |
| 0 | 255 | 255 | 255 | 0 |
| 0 | 255 | 0 | 255 | 0 |
| 0 | 255 | 255 | 255 | 0 |
| 0 | 0 | 0 | 0 | 0 |
The table above shows a simple shape where the center is black (0) surrounded by white (255), with all edges in black.
How Are Color Images Represented?
Color images are one step more complex. Each pixel represents three color values, R (red), G (green), and B (blue), each as a number between 0 and 255. This is called the RGB system. The RGB values indicate the brightness intensity of red, green, and blue respectively.
For example:
- (255, 0, 0) → red
- (0, 255, 0) → green
- (0, 0, 255) → blue
- (255, 255, 255) → white
- (0, 0, 0) → black
- (255, 255, 0) → yellow
A 5x5 color image can be expressed like this:
Example: 5x5 color image
| (255, 0, 0) | (0, 255, 0) | (0, 0, 255) | (255, 255, 255) | (0, 0, 0) |
| --------------- | ------------- | ------------- | --------------- | --------------- |
| (255, 255, 0) | (0, 255, 255) | (255, 0, 255) | (128, 128, 128) | (64, 64, 64) |
| (192, 192, 192) | (128, 0, 0) | (0, 128, 0) | (0, 0, 128) | (128, 128, 0) |
| (0, 128, 128) | (128, 0, 128) | (64, 64, 64) | (192, 192, 192) | (255, 255, 255) |
| (0, 0, 0) | (255, 0, 0) | (0, 255, 0) | (0, 0, 255) | (255, 255, 0) |
The pixel table above represents an image with a variety of mixed colors, as shown below.
This is a very small image, but real images that AI processes are usually 1,024×1,024 pixels or larger. That means a single photo is made up of millions of pixels. Since each pixel has three color values, AI ultimately handles combinations of millions of numbers.
A photo of a cat like this is, to AI, ultimately just a matrix of millions of numbers.
How Does AI Use These Numbers?
AI does not make judgments by understanding meaning, such as "this animal is a cat." Instead, it learns the numerical patterns that appear repeatedly in cat images.
For example, suppose AI is given hundreds of thousands of cat photos. It discovers the following recurring patterns in the data:
- Certain brightness changes appear at regular intervals.
- Two dark dots frequently appear near a rounded shape.
- A triangular brightness pattern at the top repeats consistently.
In other words, AI does not recognize the concept of "ears" or "eyes" the way humans do. Instead, it learns the recurring structure of pixel-based numerical arrays that appear in cat images. And the more these patterns appear, the more precisely it calculates the probability that something is a cat.
Sound Is Also Represented as Numbers
To AI, sound is ultimately a stream of numbers.
Sound is vibration produced by air moving. A microphone measures this vibration at very short time intervals and records how strongly the air vibrated at each moment as a number. This converts sound into a long array of numbers.
Example: Partial sound waveform
[0.02, 0.15, 0.30, 0.10, -0.05, -0.20, -0.10, 0.05 ...]
These numbers show how strongly the sound vibrated over time. A large value means a loud sound; a small value means a quiet one.
A human hears this and understands "hello." AI, however, sees the patterns of change in these numbers.
For example, the sound "hel-" always produces a similar numerical flow. The sound "-lo" does the same. By processing vast amounts of audio, AI learns which number patterns frequently appear alongside which words.
AI does not "listen" to sound. It analyzes the numerical patterns of sound that change over time and predicts the corresponding word.
Just as with images, sound is ultimately represented as numbers, and AI works by finding recurring structures within those numbers.
How Does Text Become Numbers?
Just as images and sounds are represented as numbers, text must also be converted to numbers before AI can process it. Computers cannot directly understand the "meaning" of letters. First, the text is broken into small pieces, and each piece is assigned a number.
Step 1: Breaking Text into Pieces (Tokenization)
Take the following sentence as an example:
"The weather feels nice today."
AI does not process this sentence as-is. It first breaks it into word or character units. These pieces are called tokens.
Example:
- The
- weather
- feels
- nice
- today
Or broken down even further:
- T
- h
- e
- ...
This process of splitting a sentence into small units is called tokenization.
Step 2: Assigning Numbers to Tokens
Next, each token is assigned a unique number. For example:
- today → 1023
- weather → 4581
- nice → 9002
The sentence then becomes:
[1023, 4581, 9002]
The sentence is now a number array.
Step 3: Converting Numbers into Vectors
But simple numbers alone are not enough. AI represents each word using a larger collection of numbers. This is called an embedding.
For example, the word "cat" might be represented as a group of numbers like this:
[0.12, -0.44, 0.88, 0.03, ...]
This group of numbers captures the characteristics of the word. Interestingly, words with similar meanings or in the same category end up with similar number arrays.
For example, since "cat" and "dog" belong to the same category, they might be represented in AI's learned number space like this:
- cat → [0.12, -0.44, 0.88, ...]
- dog → [0.10, -0.40, 0.85, ...]
But "car" belongs to a completely different category, so its number array looks quite different:
- car → [-0.60, 0.81, -0.12, ...]
How Does AI Use These Numbers for Text?
A human reads a sentence and understands its meaning. AI, however, processes it like this:
- Split the sentence into small pieces.
- Convert each piece into numbers.
- Analyze the number patterns to predict the next word.
AI does not "read" text. It predicts the next number based on patterns in the number-converted words.
Key Takeaway
Images, sounds, and text look like completely different kinds of information on the surface, but inside AI they are all data converted into numbers. What AI does is ultimately find the recurring patterns and relationships among those numbers.
| What we see | What AI processes |
|---|---|
| A photo of a cat | A number array of millions of pixels |
| A human voice | A continuous series of vibration values over time |
| A sentence | A number vector converted from tokens |
AI does not directly feel or understand meaning. Instead, it calculates the relationships between numbers and uses those relationships to predict the next result. But when this calculation becomes sufficiently refined, we begin to feel as though AI has "understood" the situation.