Researchers Evaluate LLMs For Hinglish SRH Intent

In 2026, Emory University researchers evaluated proprietary, multilingual open-weight, and Indic LLMs in zero-shot settings on 4,161 deidentified Hinglish sexual and reproductive health (SRH) questions collected in urban Mumbai. GPT-5 achieved the highest hierarchical F1 (hF1=0.784), while Sarvam-M approached near-state-of-the-art (hF1=0.757); models commonly failed on fine-grained intent, euphemisms, and culturally situated expressions.
Scoring Rationale
Robust comparative evaluation on 4,161 Hinglish SRH queries; limited by zero-shot setup and single-city dataset.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems

