Amazon Nova Forge Demonstrates 17% F1 Improvement

The AWS China Applied Science team evaluated Amazon Nova Forge by fine-tuning models on a 15,372-sample Voice of Customer dataset with a 1,420-category four-level taxonomy. Their data-mixing supervised fine-tuning produced a 17% F1 improvement on the VOC task while preserving near-baseline MMLU scores and instruction-following abilities, reducing catastrophic forgetting.
Key Points
- 1Shows 17% F1 improvement on in-domain VOC classification using Nova Forge data mixing
- 2Mitigates catastrophic forgetting by preserving near-baseline MMLU scores and instruction-following abilities
- 3Enables enterprises to fine-tune frontier models on proprietary data without losing general capabilities
Scoring Rationale
Strong official evaluation demonstrating domain gains and capability retention; limited novelty beyond service-specific comparative implementation.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

