Practical AI Image Analysis in the Workplace
AI now possesses vision
.
Thanks to multimodal technology, which processes various types of data such as images, videos, and audio, it has become significantly easier to analyze and process images using AI.
Multimodal
refers to technology that processes multiple types of data simultaneously.
In 2023, OpenAI launched GPT Vision
, which specializes in image analysis, demonstrating that AI can perform detailed image analysis. This capability has since been integrated into the GPT-4o
model.
How AI Analyzes Images
The method by which AI analyzes images can be broadly divided into three stages.
Image Recognition
The input image is divided into small segments, and each segment is analyzed to determine what it represents.
Feature Extraction
The image is analyzed to find specific patterns and key elements.
These characteristic elements within an image are referred to as landmarks
. Examples of landmarks include facial features such as eyes, nose, mouth, and ears.
Content Interpretation
Finally, the identified features are combined to interpret what the entire image represents.
For example, segments including trees, the sky, and a person are combined to interpret the image as "a person walking in the park."
How Can AI Be Used for Image Analysis?
There is a wide variety of ways to use AI for image analysis. Here are some representative use cases.
Extracting Text Data from Images
AI can be used to extract text from images, for example, extracting phone numbers from a business card image or amounts from a receipt image.
This process of extracting text from images is known as OCR (Optical Character Recognition)
.
Automating Image Classification
When classifying thousands or even tens of thousands of images, using AI can make the process much faster and more accurate.
Data Analysis
AI can analyze images of graphs, charts, and tables to extract data or visualize data by analyzing images.
For instance, it can analyze stock chart images to extract stock prices or analyze map images to visualize population density.
Prompt Engineering Methods Specialized for Image Analysis
When crafting prompts for image analysis, employing the following methods can yield more accurate results.
1. Specifying Image Context and Output
Providing background information or related context of the image can lead to better accurate results.
Prompt Example
-
This image is a photograph taken in nature. Identify 3 main objects.
-
This photo is a business card. Extract the name, job title, and contact information.
-
The following graph represents book sales revenue for the second half of 2023. Extract the sales amount and book categories from the graph and organize them in a table.
2. Highlighting Specific Details
Instruct the AI to analyze specific parts, text, or objects in the image.
Prompt Example
-
Extract the text located in the top right corner of this image.
-
Describe the individual in the center of this photo.
-
Extract the sales figure for July 2023 in this graph.
3. Specifying Answer Output Format
It's beneficial to explicitly specify the output format
in the prompt, such as CSV (comma-separated values used in spreadsheets), Table, List, Sentence, etc.
Prompt Example
-
Organize the extracted values from the graph in CSV format for use in Excel.
-
Organize the extracted name, job title, and contact information from the business card into a list.
Practicing Image Analysis Prompts
By applying the methods above, you can draft an image analysis prompt as follows:
The provided image is a business card.
Please extract the name, job title, contact information, and email from the card.
Organize the extracted information in CSV format.
-
Image Context: Business card
-
Extraction Details: Name, job title, contact information, email
-
Answer Output Format: CSV format
Practice
Send a prompt example and compare the AI's responses.