Researchllmbias detectionpromptingevaluation benchmark
ICE-Guard Detects Spurious Feature Reliance Across Domains
|
9.2

Researchers introduce ICE-Guard (Mar 19, 2026) to detect spurious feature reliance in LLMs using intervention consistency testing. They evaluate 11 models across 3,000 vignettes in 10 high-stakes domains, finding authority (5.8%) and framing (5.0%) biases exceed demographic bias (2.2%), and show mitigations reduce flips up to 100% and cut bias 78% cumulatively.
Scoring Rationale
Strong methodology and practical mitigations, limited by being a single arXiv preprint without peer review.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
Used by DS/ML engineers at top companies
High-Value Overnight OrdersEasyDelivered International ShipmentsMediumOn-Time Delivery Rate by CarrierHard
250 free problems · No credit card
See all Logistics & Shipping problems

