Generating Data in the Desired Format
There are times in your work when you need to generate virtual data for practice or testing. For example, you might need to create virtual customer data for testing and simulation of new software or practice training data for AI training.
When creating data tailored to specific situations, utilizing AI can help you generate data efficiently.
What should you be aware of when generating data using AI? First, it's essential to clearly define the desired format of the data. Without a clear format, the data generated by AI may not match what you need.
Let's now explore three data formats commonly used in the IT industry.
JSON
JSON, short for J
avaS
cript O
bject N
otation, is a data exchange format commonly used in web/mobile applications.
JSON consists of:
-
Objects: Encapsulated data within curly braces
{ }
, comprised of key-value pairs- Keys are enclosed in double quotes
- Values can be strings, numbers, arrays, or objects
-
Arrays: Data enclosed within square brackets
[ ]
, representing an ordered list of values- Elements in an array are separated by commas
- Elements can be strings, numbers, objects, or arrays
JSON format is easily understandable by both humans and machines.
// Array consisting of two objects
[
{
"name": "John Doe", // Object with key-value pairs
"math": 85,
"english": 90
},
{
"name": "Jane Smith",
"math": 88,
"english": 80
}
]
XML
XML, or 'eXtensible Markup Language', is primarily used for representing hierarchical data structures.
XML consists of:
-
Tags: Encapsulated data within
< >
, representing the structure of the data- Consists of start tags and end tags
- Start tag:
<tagname>
, End tag:</tagname>
-
Attributes: Used to express additional information within tags
- Attributes can be added using
<tagname attributename="attributevalue">
- For example,
<student gender="male">
adds a gender attribute to the student tag
- Attributes can be added using
Using the student data from the JSON example, here's how it appears in XML:
<studentList>
<student>
<name>John Doe</name>
<math>85</math>
<english>90</english>
</student>
<student>
<name>Jane Smith</name>
<math>88</math>
<english>80</english>
</student>
</studentList>
CSV
CSV, or 'Comma-Separated Values', is a text format for separating data using commas (,).
name,math,english
John Doe,85,90
Jane Smith,88,80
For a detailed explanation of CSV, please refer to the previous lesson.
Crafting a Data Generation Prompt
When generating data, it's advisable to specify the data format
and provide a few data examples using few-shot prompting
.
Let's create a prompt for generating student data in JSON format.
### Instructions: Refer to the JSON example below and generate student data. There should be 4 student data objects in the JSON array.
### JSON Example
[
{
"name": "John Doe",
"math": 85,
"english": 90
},
{
"name": "Jane Smith",
"math": 88,
"english": 80
}
]
In the prompt above, ###
delimiters are used to structure the prompt, and two example student data entries are provided. The AI is instructed to generate four student data objects.
By specifying the data format according to your requirements and crafting a prompt for data generation, you can effectively generate test/practice data.
Practice
Send the prompt example and compare the AI's response.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.