Fine-Tuning Data Formatter — JSONL Export

Convert CSV, JSON, or manual input into JSONL format for OpenAI and Anthropic fine-tuning. Token count and cost estimates included.

About Fine-Tuning Data Formatter

Fine-tuning AI models requires training data in specific formats. OpenAI uses JSONL with system/user/assistant message arrays, while Anthropic has its own format requirements.

This tool helps you convert your training data from CSV, JSON, or manual input into properly formatted JSONL files. It validates the format, shows a preview, and allows you to edit individual entries before downloading.

OpenAI fine-tuning requires JSONL format where each line is a JSON object with a 'messages' array containing system, user, and assistant roles. Anthropic uses a similar but distinct format. A single malformed line rejects the entire training file, so validation before upload is essential. This formatter catches formatting errors, missing fields, and encoding issues.

Training data quality matters more than quantity. OpenAI recommends 50-100 examples for meaningful improvement, but each example must demonstrate the exact behavior you want the model to learn. Include diverse inputs covering edge cases, different phrasings, and various difficulty levels. Consistent formatting across all examples helps the model generalize better.

The formatter shows estimated token counts and training costs per example. Fine-tuning GPT-4o Mini costs approximately $3 per million training tokens, making a 100-example dataset cost under $1 to train. The tool also flags common issues like duplicate examples, empty responses, and system messages that are inconsistent across entries.

How the Fine-Tuning Formatter Works

  1. Upload or paste your training data (CSV, JSON, or plain text)
  2. Map your columns to the required fields (system, user, assistant)
  3. The tool converts each row into the correct JSONL format for your target model
  4. Download the formatted .jsonl file ready for upload to OpenAI, Anthropic, or other providers

Preparing Data for LLM Fine-Tuning

Fine-tuning data quality matters more than quantity — 50 high-quality examples often outperform 500 mediocre ones. Each example should represent the exact input-output pattern you want the model to learn. Include diverse edge cases to prevent the model from overfitting to a narrow pattern. OpenAI recommends at least 10 examples but suggests 50-100 for noticeable quality improvements. Always validate your JSONL file before uploading — a single malformed line will reject the entire batch.

When to Use the Fine-Tuning Data Formatter

Use this tool when preparing training data for fine-tuning AI models like GPT-4o Mini or Claude. It converts your existing data from CSV, JSON, or manual input into the exact JSONL format required by each provider. It is especially useful when you have training data in spreadsheets and need to convert it into the correct message format without manual JSON editing.

Common Use Cases

  • Converting spreadsheet data into OpenAI-compatible JSONL training files
  • Preparing customer support conversation logs for fine-tuning a support chatbot
  • Formatting code completion examples for specialized coding model training
  • Validating JSONL files before upload to catch formatting errors that would reject the batch

Expert Tips

  • Include diverse edge cases in your training data — the model only learns patterns it sees in examples.
  • Keep system messages identical across all training examples for consistent model behavior.
  • Review the token count estimate before uploading — unexpectedly high counts may indicate formatting issues or overly verbose examples.

Frequently Asked Questions

How many training examples do I need for fine-tuning?
OpenAI requires a minimum of 10 examples but recommends 50-100 for noticeable quality improvements. For specialized tasks, 200-500 examples typically produce strong results. Quality matters more than quantity — 50 carefully crafted examples often outperform 500 sloppy ones. Each example should demonstrate the exact behavior you want the model to learn.
What is the difference between OpenAI and Anthropic fine-tuning formats?
OpenAI uses JSONL where each line contains a 'messages' array with objects having 'role' (system/user/assistant) and 'content' fields. Anthropic's format uses 'Human' and 'Assistant' turn markers. This tool handles the conversion automatically — select your target platform and the correct format is generated.
How much does fine-tuning cost?
OpenAI charges per training token: GPT-4o Mini costs approximately $3 per million training tokens. A 100-example dataset with average 200 tokens per example costs about $0.06 to train. The tool displays estimated costs based on your actual token counts before you download the file.
Should every example include a system message?
Not necessarily, but consistency matters. If you include a system message in some examples, include it in all of them with the same text. The system message sets the behavior context — inconsistent system messages confuse the model during training and produce unreliable outputs.

Related Tools

Learn More