Researchllmalgorithmic biasanthropicopenai
LLMs Exhibit Differential Accuracy Across User Profiles
9.5
Relevance Score
Researchers at MIT evaluated GPT-4, Claude 3 Opus, and Llama 3 across honesty/truthfulness and scientific question datasets in experiments varying education, English proficiency, and country of origin. They found significant accuracy declines for lower education and non-native English, with Claude 3 Opus showing larger accuracy gaps and higher refusal rates for users identified as from Iran (about 11% refusals versus 3.6% control). The study also documented higher rates of dismissive tone for less-educated users.



