Skip to main content
Crowdfunding
Python + AI for Geeks
Practice

Data Formats Used for Training AI

To train AI models, data must be transformed into formats that AI systems can understand.

In this lesson, we'll explore the main data file formats used for training AI: CSV, JSON, and XML.


CSV

CSV stands for Comma-Separated Values and is used to store and transfer table-like data.

Each row represents a single data entry, while each column represents a specific attribute of the data.

Values in each column are separated by commas (,).

For example, a CSV file storing math and English scores of students by name can be represented as follows:

CSV Example
Name,Math,English
John Doe,85,90
Jane Smith,88,80

CSV files are stored as text files with the .csv file extension and can be easily opened and edited in various data management programs like Microsoft Excel, Google Sheets, and database programs.


JSON

JSON (JavaScript Object Notation) is commonly used for data storage and exchange in web and mobile applications.

JSON consists of objects and arrays, with objects wrapped in curly braces { } and arrays in square brackets [ ].


JSON Example
// Array in square brackets
[
// Object in curly braces
{
"Name": "John Doe",
"Math": 85,
"English": 90
},
{
"Name": "Jane Smith",
"Math": 88,
"English": 80
}
]

A data file format where multiple JSON objects are listed, one per line, is called JSONL (JSON Lines).

JSONL Example
{"Name": "John Doe", "Math": 85, "English": 90}
{"Name": "Jane Smith", "Math": 88, "English": 80}

When training OpenAI's AI models or general-purpose machine learning models, data files in JSONL format are often used.


XML

XML (eXtensible Markup Language) is primarily used to represent hierarchical data structures.

The key elements of XML include:

  1. Tags: Data enclosed in angle brackets < > represent the data's structure.

    • Tags consist of opening and closing tags.
    • An opening tag is <tagname>, and a closing tag is </tagname>.
  2. Attributes: Used to provide additional information within a tag.

    • To add attributes to a tag, use <tagname attributename="attributevalue">.
    • Example: <Student gender="Male"> adds a gender attribute to the Student tag.

Below is how the JSON example is represented in XML.

XML Example
<StudentList>
<Student>
<Name>John Doe</Name>
<Math>85</Math>
<English>90</English>
</Student>
<Student>
<Name>Jane Smith</Name>
<Math>88</Math>
<English>80</English>
</Student>
</StudentList>

When training image-related AI models, image file formats like .jpg and .png are used.

Image files are comprised of pixel values, and AI models interpret these pixel values to recognize and classify images.

The data file formats used for training AI models vary, and the correct format should be chosen according to the model's design strategy.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.