LLMs Produce Unequal Answers For Different Users

A new study from the MIT Center for Constructive Communication evaluates GPT-4, Claude 3 Opus, and Llama 3-8B on science and TruthfulQA benchmarks and finds accuracy declines, higher refusal rates, and occasional patronizing tone when user bios indicate lower education, non-native English, or certain national origins. The paper shows these effects compound across traits and warns of increased misinformation risks for vulnerable users.
Scoring Rationale
Strong empirical evidence from MIT showing demographic-dependent LLM failures; scope limited to specific models and benchmarks.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems


