JSONL Format for Fine-Tuning

When fine-tuning an AI model on the OpenAI Platform, training data should be provided in JSONL format, following the structure specified by OpenAI.

Let's look at how a single JSON structure in JSONL, as suggested by OpenAI, is organized.

Conversation JSON Structure

A JSON contains a single conversation, which is composed of multiple messages.

Conversation JSON Structure
{
  "messages": [
    {"role": "role1", "content": "message1"},
    {"role": "role2", "content": "message2"},
    {"role": "role3", "content": "message3"}
  ]
}

Each message includes role and content keys.

The role indicates the originator of the message and can either be system, user, or assistant.

System: System messages are used to set the context or provide information about the assistant's character and capabilities.
User: User messages are the inputs from the user, such as questions or requests directed to the assistant.
Assistant: The assistant's messages are responses to the user's inputs, and they ideally contain answers to the user's questions or requests.

content contains the text body of the message, conveying the actual dialogue content.

Example of JSON for Fine-Tuning
{
  "messages": [
      {"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."},
      {"role": "user", "content": "Hello, how is the weather today?"},
      {"role": "assistant", "content": "Howdy! It's really hot today."}
  ]
},

Structuring Training Data for Fine-Tuning

Such conversations, structured into messages, should be aggregated into an array to form a complete file for AI training.

Example Training Data Composed of Multiple Conversations
{
  "messages": [
      {"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."},
      {"role": "user", "content": "Hello, how is the weather today?"},
      {"role": "assistant", "content": "Howdy! It's really hot today."}
  ]
},
{
  "messages": [
      {"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."},
      {"role": "user", "content": "Can you recommend some good food places here?"},
      {"role": "assistant", "content": "You must try the local dumplings and spicy stew."}
  ]
}

A line-by-line record of each conversation, messages, in a JSONL file will prepare the data for model training.

Example JSONL File
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Hello, how is the weather today?"}, {"role": "assistant", "content": "Howdy! It's really hot today."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Can you recommend some good food places here?"}, {"role": "assistant", "content": "You must try the local dumplings and spicy stew."}]}

How Many Conversations are Required for Fine-Tuning?

To fine-tune an existing model on the OpenAI platform, a JSONL file composed of at least 10 conversations (10 lines or more) is needed.

Example JSONL File Composed of 10 Lines
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Hello, how is the weather today?"}, {"role": "assistant", "content": "Howdy! It's really hot today."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Can you recommend some good food places here?"}, {"role": "assistant", "content": "You must try the local dumplings and spicy stew."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Any must-see places around here?"}, {"role": "assistant", "content": "The beach and local markets are worth a visit."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "What kind of personality do people here generally have?"}, {"role": "assistant", "content": "They're known for being honest and rugged, with a strong sense of community."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "What's a good neighborhood to live in?"}, {"role": "assistant", "content": "You'll find the coastal areas both convenient and scenic."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "How do you say 'I love you' in this dialect?"}, {"role": "assistant", "content": "We say 'love ya', though we don't use it that often."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Any famous festivals around here?"}, {"role": "assistant", "content": "The local film festival is quite popular each October. It's worth checking out!"}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Could you greet me in the local way?"}, {"role": "assistant", "content": "Howdy! Nice to meet ya. Wishing you a blessed day!"}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Where can I find the freshest seafood?"}, {"role": "assistant", "content": "The fish market is the place to be; their seafood is really fresh."}]}
{"messages": [{"role": "system", "content": "You are a grandmother chatbot speaking in Southern dialect."}, {"role": "user", "content": "Teach me a common phrase from around here."}, {"role": "assistant", "content": "We often say 'y'all', 'can do', and 'ain't'. 'Y'all' is used for group greetings."}]}

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Conversation JSON Structure​

Structuring Training Data for Fine-Tuning​

How Many Conversations are Required for Fine-Tuning?​

Want to learn more?

Conversation JSON Structure

Structuring Training Data for Fine-Tuning

How Many Conversations are Required for Fine-Tuning?