Skip to main content
Practice

Data Formats Used for Training AI

In order to train AI models, data must be transformed into a format that AI can understand.

In this lesson, we'll explore the key data formats used to train AI, including CSV, JSON, and XML.


CSV

CSV, which stands for Comma-Separated Values, is used to store and transmit data in a table format.

Each row represents an individual data entry, while each column represents a specific attribute of that data. The values in each column are separated by commas.

For example, a CSV file that stores students' math and English grades by name could be represented as follows:

CSV Example
Name,Math,English
John Doe,85,90
Jane Smith,88,80

CSV files are saved with the .csv file extension and can be easily opened and edited with various data management programs like Microsoft Excel, Google Sheets, or database software.


JSON

JSON (JavaScript Object Notation) is primarily used for storing and exchanging data in web and mobile applications.

JSON is composed of objects and arrays; objects are enclosed in curly braces { }, and arrays are enclosed in square brackets [ ].

For more details, see the next lesson.


JSON Example
// An array enclosed in square brackets
[
// An object enclosed in curly braces
{
"Name": "John Doe",
"Math": 85,
"English": 90
},
{
"Name": "Jane Smith",
"Math": 88,
"English": 80
}
]

XML

XML (eXtensible Markup Language) is mainly used to represent the hierarchical structure of data.

The key elements of XML are as follows:

  1. Tags: Data enclosed within < >, expressing the hierarchical structure.

    • Tags are divided into start tags and end tags.
    • A start tag is denoted by <tagname>, and an end tag by </tagname>.
  2. Attributes: Used to provide additional information within a tag.

    • To add an attribute to a tag, use the format <tagname attributename="attributevalue">.
    • Example: <Student gender="male"> is an example of adding a gender attribute to a Student tag.

Below is the JSON example represented in XML.

XML Example
<StudentList>
<Student>
<Name>John Doe</Name>
<Math>85</Math>
<English>90</English>
</Student>
<Student>
<Name>Jane Smith</Name>
<Math>88</Math>
<English>80</English>
</Student>
</StudentList>

In addition, when training image-related AI models, images are used as training data, and text files (.txt) are often used for training natural language processing models.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.