LLMs Exhibit Differential Accuracy Across User Profiles

Researchers at MIT evaluated GPT-4, Claude 3 Opus, and Llama 3 across honesty/truthfulness and scientific question datasets in experiments varying education, English proficiency, and country of origin. They found significant accuracy declines for lower education and non-native English, with Claude 3 Opus showing larger accuracy gaps and higher refusal rates for users identified as from Iran (about 11% refusals versus 3.6% control). The study also documented higher rates of dismissive tone for less-educated users.
Scoring Rationale
High-impact cross-model empirical evidence; strong methodology and official data, but limited to three models and specific demographics.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems