Firefox Improves Local Alt Text Dataset Quality
Mozilla's Firefox team documents dataset-quality work to improve on-device alt text generation on Dec. 15, 2025, reporting a GPT-4o-driven transformation of Flickr30k and COCO captions used to train the local alt-text model. The analysis uses CLIP, BERTScore, and bias detection to show major reductions in demographic mentions while surfacing fair image–text alignment and severe class imbalance for targeted fixes.
Key Points
- 1Transformed 31,014 Flickr30k captions with GPT-4o to systematically remove demographic descriptors.
- 2Measured reductions: gender mentions dropped 67%→0%, race/ethnicity 27%→1%, nationality eliminated.
- 3Provided reproducible quality tool using CLIP, BERTScore, Gini, entropy to guide dataset rebalancing.
Scoring Rationale
Practical, well-documented dataset and tooling with official results, but limited novelty beyond applied curation.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems