Skip to main content
Practice

Generating Data in the Desired Format

There are times in your work when you need to generate virtual data for practice or testing. For example, you might need to create virtual customer data for testing and simulation of new software or practice training data for AI training.

When creating data tailored to specific situations, utilizing AI can help you generate data efficiently.

What should you be aware of when generating data using AI? First, it's essential to clearly define the desired format of the data. Without a clear format, the data generated by AI may not match what you need.

Let's now explore three data formats commonly used in the IT industry.


JSON

JSON, short for JavaScript Object Notation, is a data exchange format commonly used in web/mobile applications.

JSON consists of:

  1. Objects: Encapsulated data within curly braces { }, comprised of key-value pairs

    • Keys are enclosed in double quotes
    • Values can be strings, numbers, arrays, or objects
  2. Arrays: Data enclosed within square brackets [ ], representing an ordered list of values

    • Elements in an array are separated by commas
    • Elements can be strings, numbers, objects, or arrays

JSON format is easily understandable by both humans and machines.


JSON Example
// Array consisting of two objects
[
{
"name": "John Doe", // Object with key-value pairs
"math": 85,
"english": 90
},
{
"name": "Jane Smith",
"math": 88,
"english": 80
}
]

XML

XML, or 'eXtensible Markup Language', is primarily used for representing hierarchical data structures.

XML consists of:

  1. Tags: Encapsulated data within < >, representing the structure of the data

    • Consists of start tags and end tags
    • Start tag: <tagname>, End tag: </tagname>
  2. Attributes: Used to express additional information within tags

    • Attributes can be added using <tagname attributename="attributevalue">
    • For example, <student gender="male"> adds a gender attribute to the student tag

Using the student data from the JSON example, here's how it appears in XML:

XML Example
<studentList>
<student>
<name>John Doe</name>
<math>85</math>
<english>90</english>
</student>
<student>
<name>Jane Smith</name>
<math>88</math>
<english>80</english>
</student>
</studentList>

CSV

CSV, or 'Comma-Separated Values', is a text format for separating data using commas (,).

CSV Example
name,math,english
John Doe,85,90
Jane Smith,88,80

For a detailed explanation of CSV, please refer to the previous lesson.


Crafting a Data Generation Prompt

When generating data, it's advisable to specify the data format and provide a few data examples using few-shot prompting.

Let's create a prompt for generating student data in JSON format.

JSON Format Student Data Generation Prompt Example
### Instructions: Refer to the JSON example below and generate student data. There should be 4 student data objects in the JSON array.

### JSON Example
[
{
"name": "John Doe",
"math": 85,
"english": 90
},
{
"name": "Jane Smith",
"math": 88,
"english": 80
}
]

In the prompt above, ### delimiters are used to structure the prompt, and two example student data entries are provided. The AI is instructed to generate four student data objects.

By specifying the data format according to your requirements and crafting a prompt for data generation, you can effectively generate test/practice data.


Practice

Send the prompt example and compare the AI's response.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.