What Data Formats Do Different AI Models Use?
So far, we have explored the dataset format for fine-tuning on the OpenAI platform.
But what data formats do other AI models use?
Text processing AI models can use different forms of JSONL datasets, and AI models that take other input types, such as image processing models, can have their unique data formats.
Text Processing AI Models
For example, JSONL datasets composed of prompt representing user input and completion representing the output generated by the AI model can be used.
{"prompt": "What is the capital of France?", "completion": "The capital of France is Paris."}
{"prompt": "What is the smallest state in the US?", "completion": "The smallest state in the US is Rhode Island."}
Image Processing AI Models
When training or fine-tuning image processing models, you can generally use a CSV (Comma-Separated Values) file that includes the path to the image files and the label of the image.
imagePath,label
"/path/to/image1.jpg","cat"
"/path/to/image2.jpg","dog"
Depending on the AI model, the image path and label can also be used in other formats like JSON or XML. For instance, another image processing AI model might use a JSON format dataset as shown below.
{
  "images": [
    {"path": "/path/to/image1.jpg", "label": "cat"},
    {"path": "/path/to/image2.jpg", "label": "dog"}
  ]
}
As you can see, various data formats can be used depending on the AI model, and you should structure your dataset according to the model's requirements.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.