Skip to main content
Practice

Practical AI Image Analysis in the Workplace

AI now possesses vision.

Thanks to multimodal technology, which processes various types of data such as images, videos, and audio, it has become significantly easier to analyze and process images using AI.

Multimodal refers to technology that processes multiple types of data simultaneously.


In 2023, OpenAI launched GPT Vision, which specializes in image analysis, demonstrating that AI can perform detailed image analysis. This capability has since been integrated into the GPT-4o model.


How AI Analyzes Images

The method by which AI analyzes images can be broadly divided into three stages.


Image Recognition

The input image is divided into small segments, and each segment is analyzed to determine what it represents.

Feature Extraction

The image is analyzed to find specific patterns and key elements.

These characteristic elements within an image are referred to as landmarks. Examples of landmarks include facial features such as eyes, nose, mouth, and ears.

Content Interpretation

Finally, the identified features are combined to interpret what the entire image represents.

For example, segments including trees, the sky, and a person are combined to interpret the image as "a person walking in the park."


How Can AI Be Used for Image Analysis?

There is a wide variety of ways to use AI for image analysis. Here are some representative use cases.


Extracting Text Data from Images

AI can be used to extract text from images, for example, extracting phone numbers from a business card image or amounts from a receipt image.

This process of extracting text from images is known as OCR (Optical Character Recognition).

Automating Image Classification

When classifying thousands or even tens of thousands of images, using AI can make the process much faster and more accurate.

Data Analysis

AI can analyze images of graphs, charts, and tables to extract data or visualize data by analyzing images.

For instance, it can analyze stock chart images to extract stock prices or analyze map images to visualize population density.


Prompt Engineering Methods Specialized for Image Analysis

When crafting prompts for image analysis, employing the following methods can yield more accurate results.


1. Specifying Image Context and Output

Providing background information or related context of the image can lead to better accurate results.

Prompt Example

  • This image is a photograph taken in nature. Identify 3 main objects.

  • This photo is a business card. Extract the name, job title, and contact information.

  • The following graph represents book sales revenue for the second half of 2023. Extract the sales amount and book categories from the graph and organize them in a table.

2. Highlighting Specific Details

Instruct the AI to analyze specific parts, text, or objects in the image.

Prompt Example

  • Extract the text located in the top right corner of this image.

  • Describe the individual in the center of this photo.

  • Extract the sales figure for July 2023 in this graph.

3. Specifying Answer Output Format

It's beneficial to explicitly specify the output format in the prompt, such as CSV (comma-separated values used in spreadsheets), Table, List, Sentence, etc.

Prompt Example

  • Organize the extracted values from the graph in CSV format for use in Excel.

  • Organize the extracted name, job title, and contact information from the business card into a list.


Practicing Image Analysis Prompts

By applying the methods above, you can draft an image analysis prompt as follows:

Example for Extracting Text from a Business Card
The provided image is a business card.

Please extract the name, job title, contact information, and email from the card.

Organize the extracted information in CSV format.
  • Image Context: Business card

  • Extraction Details: Name, job title, contact information, email

  • Answer Output Format: CSV format


Practice

Send a prompt example and compare the AI's responses.