Guides / Fine-Tuning

Fine-Tuning LLMs

When and how to fine-tune a language model on Turkish domain data for production use.

10 min read

Getting Started Prompt Engineering RAG Fine-Tuning AI Agents Evaluation Deployment

When to fine-tune

Fine-tune when: (1) prompt engineering plateaus at <85% accuracy, (2) you need consistent output format, (3) latency matters and you can self-host.

Dataset preparation

Minimum 500 examples in instruction-tuning format. For Turkish: ensure 90%+ Turkish-language examples, include domain-specific terminology, and balance label distribution.

{"instruction": "Metni özetle:", "input": "...", "output": "..."}

Choose base model

For Turkish: Llama-3.1-8B-Instruct or Mistral-7B-v0.3 with LoRA (rank=16). Both have strong Turkish tokenisation. Avoid models with <5% Turkish in pretraining.

Training setup

Use QLoRA (4-bit NF4 quantisation) for <24GB VRAM. Train for 3 epochs, eval every 100 steps. Learning rate 2e-4 with cosine schedule.

Evaluate & merge

Compare fine-tuned vs. base on held-out test set. If improvement >5%, merge LoRA adapters and push to registry. If not, review data quality first.