When and how to fine-tune a language model on Turkish domain data for production use.
10 min readFine-tune when: (1) prompt engineering plateaus at <85% accuracy, (2) you need consistent output format, (3) latency matters and you can self-host.
Minimum 500 examples in instruction-tuning format. For Turkish: ensure 90%+ Turkish-language examples, include domain-specific terminology, and balance label distribution.
{"instruction": "Metni özetle:", "input": "...", "output": "..."} For Turkish: Llama-3.1-8B-Instruct or Mistral-7B-v0.3 with LoRA (rank=16). Both have strong Turkish tokenisation. Avoid models with <5% Turkish in pretraining.
Use QLoRA (4-bit NF4 quantisation) for <24GB VRAM. Train for 3 epochs, eval every 100 steps. Learning rate 2e-4 with cosine schedule.
Compare fine-tuned vs. base on held-out test set. If improvement >5%, merge LoRA adapters and push to registry. If not, review data quality first.