Skip to main content
Practice

How Is Image-Generating AI Created?

Many people use AI for simple graphic design tasks or to create blog post header images.

AI models like OpenAI's DALL-E and Stability AI's Stable Diffusion generate images from text inputs.

So, how are these image-generating AI models developed?

While the specifics can vary between models, they generally follow the overarching process outlined below.


1. Data Collection

First, a large volume of images and corresponding text descriptions (labels) are gathered.

For instance, you might find a picture of "a dog playing in the park" and label it with that text description.


2. Preprocessing

The collected image and text data are converted into a digital form that computers can understand.


Image Data

The collected images are broken down into pixels (px), the smallest unit composing a screen, with each pixel represented by RGB (Red, Green, Blue) values.

For example, the color red is expressed as (255, 0, 0), and black is expressed as (0, 0, 0).


Text Data

The text is analyzed into units of words (morphemes).

For instance, the text "a dog playing in the park" is broken down into "a", "dog", "playing", "in", "the", "park".

Furthermore, each word is converted into numbers that computers can understand, a process known as embedding. For example, the word cat might be transformed into a vector (a numerical representation of words or sentences) like the following:

Vector value of 'cat'
[0.11, 0.34, 0.56, ...]

3. AI Model Training

In simple terms, AI is like the function you learned about in math class.

Image-generating AI models learn the association between input text and images, adjusting parameters (the numbers that affect the input and output of the function).

Parameters are the numbers that the image-generating AI model adjusts to connect the input text with the image and produce results, playing a role similar to spices that adjust the flavor in cooking.

Through this process, an AI model that has identified the optimal parameters can generate images based on given text descriptions.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.