Script Gap Study Reveals Romanisation Reduces Triage

Khullar et al. (2025) analyze maternal and newborn care chats across six Indian languages and English, finding romanised inputs produce 5–12 point F1 declines in LLM triage. Models (GPT‑4o, Claude 4.5, LLaMA 4, Qwen, others) often paraphrase romanised queries yet label them "insufficient information"; automatic transliteration back to native scripts restores performance, affecting 56% of user messages in the study.
Key Points
- 1Show romanised inputs cause 5–12 point F1 drops across LLMs; Kannada falls 83.7→57.3.
- 2Reveal orthographic noise and tokenisation instability, not semantic loss, drive misclassification in romanised text.
- 3Recommend automatic transliteration or normalization to native scripts to restore accuracy and fairness.
Scoring Rationale
Strong empirical finding with cross-model evidence and actionable normalization fix; limited to studied languages and settings.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems